MorfFlex CZ (the latest version is MorfFlex CZ 2.1) is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of lemma-tag-wordform triples. For each wordform, full inflectional information is coded in a positional tag. Wordforms are organized into entries (paradigm instances or paradigms in short) according to their formal morphological behavior. The paradigm (set of wordforms) is identified by a unique lemma. Apart from traditional morphological categories, the description also contains some semantic, stylistic and derivational information. For more details see a comprehensive specification of the Czech morphological annotation.
The MorfFlex CZ 2.1 dictionary contains 126,906,921 lemma-tag-wordform triples.
MorfFlex CZ 2.1 is an integral part of the PDT-C 2.0 release. It is a minor upgrade from MorfFlex CZ 2.0, with the tagset unchanged, but with some additions and corrections for full compatibility with PDT-C 2.0 morphological annotation.
Authors: Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková.
The dictionary can be downloaded from the LINDAT/CLARIAH-CZ repository under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence. Commercial license available upon request at
Previous Versions
How to cite MorfFlex CZ 2.1
If you use the dictionary in your research or need to cite it for any reason, please cite:
For LREC papers (separate language resources references):
@languageresource{lrMorfFlexCZ21, title={MorfFlex CZ 2.1}, author={Haji\v{c}, Jan and Hlavá
\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora}, url = {https://hdl.handle.net/11234/1-5833}, publisher={Institute of Formal and Applied Linguistics, LINDAT/CLARIN, Charles University}, address={Prague, Czech Republic},lindat={
https://hdl.handle.net/11234/1-5833},
year={2024} }
For general papers and citations:
@misc{
MorfFlexCZ21, title={MorfFlex CZ 2.1},author={Haji\v{c}, Jan and Hlavá
\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora}, url = {https://hdl.handle.net/11234/1-5833}, note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\\'U}FAL}), Faculty of Mathematics and Physics, Charles University}, copyright={Creative Commons - Attribution-{NonCommercial}-{ShareAlike} 4.0 International ({CC} {BY}-{NC}-{SA} 4.0)}, year={2024} }
For "plaintext" reference:
(Hajič et al., 2024)
Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková: MorfFlex CZ 2.1. Data/software, LINDAT-CLARIAH, URL: https://hdl.handle.net/11234/1-5833, 2024.
For footnote references, the following is sufficient in LaTeX papers:
\url{
https://hdl.handle.net/11234/1-5833}
Publications
Hajič Jan, Bejček Eduard, Hlaváčová Jaroslava, Mikulová Marie, Straka Milan, Štěpánek Jan, Štěpánková Barbora: Prague Dependency Treebank - Consolidated 1.0. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4, pp. 5208-5218, 2020. (pdf in ACL Anthology)
Hajič Jan: Disambiguation of Rich Inflection (Computational Morphology of Czech), Karolinum, Prague, Czechia, 2004.
Hlaváčová Jaroslava, Mikulová Marie, Štěpánková Barbora, Hajič Jan: Modifications of the Czech morphological dictionary for consistent corpus annotation. Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Slovakia, ISSN 0021-5597, pp. 380-389, 2019 (pdf - full conference)
Mikulová Marie, Hajič Jan, Hana Jiří, Hanová Hana, Hlaváčová Jaroslava, Jeřábek Emil, Štěpánková Barbora, Vidová Hladká Barbora, Zeman Daniel: Manual for Morphological Annotation, Revision for the Prague Dependency Treebank - Consolidated 2020 release. Technical report no. TR-2020-64, Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia, 2020. (pdf)
Štěpánková Barbora, Mikulová Marie, Hajič Jan: The MorfFlex Dictionary of Czech as a Source of Linguistic Data. In: Proceedings of XIX EURALEX Congress: Lexicography for Inclusion, Democritus University of Thrace, Thrace, Greece, ISBN 978-618-85138-1-5, ISSN 2521-7100, pp. 387-392, 2020. (pdf)