MorfFlex CZ (the latest version is MorfFlex CZ 2.1)  is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of lemma-tag-wordform triples. For each wordform, full inflectional information is coded in a positional tag. Wordforms are organized into entries (paradigm instances or paradigms in short) according to their formal morphological behavior. The paradigm (set of wordforms) is identified by a unique lemma. Apart from traditional morphological categories, the description also contains some semantic, stylistic and derivational information. For more details see a comprehensive specification of the Czech morphological annotation.

The MorfFlex CZ 2.1 dictionary contains 126,906,921 lemma-tag-wordform triples.

MorfFlex CZ 2.1 is an integral part of the PDT-C 2.0 release. It is a minor upgrade from MorfFlex CZ 2.0, with the tagset unchanged, but with some additions and corrections for full compatibility with PDT-C 2.0 morphological annotation.

AuthorsJan HajičJaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková

The dictionary can be downloaded from the LINDAT/CLARIAH-CZ repository under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence. Commercial license available upon request at ufal@ufal.mff.cuni.cz.

Previous Versions

 

How to cite MorfFlex CZ 2.1

If you use the dictionary in your research or need to cite it for any reason, please cite:

For LREC papers (separate language resources references):

@languageresource{lrMorfFlexCZ21,
 title={MorfFlex CZ 2.1},
 author={Haji\v{c}, Jan and Hlavá\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora},
 url = {https://hdl.handle.net/11234/1-5833},
 publisher={Institute of Formal and Applied Linguistics, LINDAT/CLARIN, Charles University}, 
 address={Prague, Czech Republic}, 
 lindat={https://hdl.handle.net/11234/1-5833},
 year={2024} }

For general papers and citations:

@misc{MorfFlexCZ21,
 title={MorfFlex CZ 2.1},
 author={Haji\v{c}, Jan and Hlavá\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora}, 
 url = {https://hdl.handle.net/11234/1-5833},
 note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\\'U}FAL}), 
 Faculty of Mathematics and Physics, Charles University}, 
 copyright={Creative Commons - Attribution-{NonCommercial}-{ShareAlike} 4.0 International ({CC} {BY}-{NC}-{SA} 4.0)},
 year={2024} }

For "plaintext" reference:

(Hajič et al., 2024)

Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková: MorfFlex CZ 2.1. Data/software, LINDAT-CLARIAH, URL: https://hdl.handle.net/11234/1-5833, 2024.

For footnote references, the following is sufficient in LaTeX papers:

\url{https://hdl.handle.net/11234/1-5833}

 

Publications

Hajič Jan, Bejček Eduard, Hlaváčová Jaroslava, Mikulová Marie, Straka Milan, Štěpánek Jan, Štěpánková Barbora: Prague Dependency Treebank - Consolidated 1.0. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4, pp. 5208-5218, 2020. (pdf in ACL Anthology)

Hajič Jan: Disambiguation of Rich Inflection (Computational Morphology of Czech), Karolinum, Prague, Czechia, 2004.

Hlaváčová Jaroslava, Mikulová Marie, Štěpánková Barbora, Hajič Jan: Modifications of the Czech morphological dictionary for consistent corpus annotation. Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Slovakia, ISSN 0021-5597, pp. 380-389, 2019 (pdf - full conference)

Mikulová Marie, Hajič Jan, Hana Jiří, Hanová Hana, Hlaváčová Jaroslava, Jeřábek Emil, Štěpánková Barbora, Vidová Hladká Barbora, Zeman Daniel: Manual for Morphological Annotation, Revision for the Prague Dependency Treebank - Consolidated 2020 release. Technical report no. TR-2020-64, Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia, 2020. (pdf)

Štěpánková Barbora, Mikulová Marie, Hajič Jan: The MorfFlex Dictionary of Czech as a Source of Linguistic Data. In: Proceedings of XIX EURALEX Congress: Lexicography for Inclusion, Democritus University of Thrace, Thrace, Greece, ISBN 978-618-85138-1-5, ISSN 2521-7100, pp. 387-392, 2020. (pdf)