DeriNet
Lexical Network of Word-Formation Relations in Czech
DeriNet is a lexical network which models word-formation relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational links (relations between derivatives and their base lexemes) or links connecting compounds with their base words.
The present version, DeriNet 2.1, contains over 1 million lexemes (sampled from the MorfFlex dictionary) connected by 782 thousand derivational relations, 144 relations of conversion, 295 relations of univerbisation, 1,952 links pointing from compounds to their base words, and 50,533 links connecting orthographic variants.
A major change is the inclusion of autogenerated full morphological segmentations of all lemmas, 202 affixoid nodes serving as a base for (neoclassical) compounding, annotation of corpus frequency of lexemes, annotation of conjugation classes of verbs, links between orthographic variants of lexemes, and a pilot annotation of univerbisation.
More details on the current file format of DeriNet releases since version 2.0 can be found in Jonáš Vidra's et al. paper presented at the DeriMo 2019 workshop.
DeriNet 2.1 was released in July 2021. It is available in the LINDAT/CLARIAH-CZ digital library at the Institute of Farmal and Aplied Linguistics, Faculty of Mathematics and Physics, Charles university under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (CC-BY-NC-SA).
For older versions of the DeriNet data see here.
Online DeriNet Tools
DeriNet data can be searched online using two versions of DeriNet Search. DeriSearch v2 shows all pieces of information stored in the data, while DeriSearch v1 displays only derivational relations (not compounding relations). The data can be viewed online using DeriNet Viewer.
Related projects
Universal Derivations (UDer)
DeriNet 2.1 is a part of the Universal Derivations (UDer), a collection of harmonized derivational resources for multiple languages. The current version contains many derivational resources for several different languages, all harmonized to the DeriNet file format. See the UDer page for more details.
Word-formation networks for other languages (created in the DeriNet-like format)
-
DeriNet.RU 0.5 for Russian (under CC-BY-NC-SA 3.0 license): derinet-ru-0.5.zip
-
DeriNet.ES 0.6 for Spanish (under CC-BY-NC-SA 3.0 license): derinet-es-2019-06-10.tsv
-
DeriNet.FA 0.5 for Farsi (under CC-BY-NC-SA 4.0 license)
The following resources were created in cooperation with Poznan University of Technology:
-
DeriNet-style derivational networks for Czech, French, Polish, and Spanish created by a semi-supervised approach using a sequential pattern mining technique, as described in an article currently under review in the LRE journal: semi-supervised.zip (four generated networks plus our hand-annotated samples, for individual licenses see README)
-
Polish Word-Formation Network v. 0.5 (under CC-BY-NC-SA): polish-wfn-0.5.zip
-
Spanish Word-Formation Network v. 0.5 (under CC-BY-ND): spanish-wfn-0.5.zip
Related publications:
-
Lukáš Kyjánek, et al. Constructing a Lexical Resource of Russian Derivational Morphology. In: Proceedings of the 13th Conference on Language Resources and Evaluation Conference (LREC). Marseille, 2022, pp. 2788-2797.
-
Jonáš Vidra & Zdeněk Žabokrtský. Transferring Word-Formation Networks Between Languages. In Proceedings of the Third Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021). France, 2021, pp. 135-144.
-
Emil Svoboda & Magda Ševčíková. Spliting and Identifying Czech Compounds: A Pilot Study. In Proceedings of the Third Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021). France, 2021, pp. 125-134.
-
Magda Ševčíková et al. Agent noun formation in Czech: An empirical study on suffix rivalry. In Second Workshop on Paradigmatic Word Formation Modelling, 2021, pp. 65-68.
-
Mateusz Lango et al. Semi-automatic construction of word-formation networks. Language Resources and Evaluation, 54, 2020, pp. 1-30.
-
Jan Bodnár et al. Semi-supervised Induction of Morpheme Boundaries in Czech Using a Word-Formation Network. In Proceedings of the 23rd International Conference Text, Speech, and Dialogue (TSD 2020), 2020, pp. 189-196.
-
Jonáš Vidra & Zdeněk Žabokrtský. Next Step in Online Querying and Visualization of Word-Formation Networks. In Proceedings of the 23rd International Conference Text, Speech, and Dialogue (TSD 2020), 2020, pp. 114-152.
-
Lukáš Kyjánek et al. Universal Derivations 1.0, A Growing Collection of Harmonised Word-Formation Resources. The Prague Bulletin of Mathematical Linguistics, 2020, 115(2), pp. 5-30.
-
Hamid Haghdoost et al. Morphological Networks for Persian and Turkish: What Can Be Induced from Morpheme Segmentation? The Prague Bulletin of Mathematical Linguistics, 2020, 115(2), pp. 105-127.
-
Lukáš Kyjánek. Harmonisation of Language Resources for Word-Formation of Multiple Languages. Master’s thesis, supervised by Magda Ševčíková. Prague, 2020. Unpublished thesis.
-
Magda Ševčíková & Lukáš Kyjánek. Introducing Semantic Labels into the DeriNet Network. Journal of Linguistics, 2019, 70(2), pp. 412-423.
-
Lukáš Kyjánek et al. Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 101-110.
-
Jonáš Vidra et al. DeriNet 2.0: Towards an All-in-One Word-Formation Resource. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 81-89.
-
Rudolf Rosa & Zdeněk Žabokrtský. Attempting to separate inflection and derivation using vector space representations. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 61-70.
-
Hamid Haghdoost et al. Building a Morphological Network for Persian on Top of a Morpheme-Segmented Lexicon. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 91-100.
-
Ján Faryad. Identifikace derivačních vztahů ve španělštině. ÚFAL Technical Report TR-2019-63. Prague, 2019.
-
Lukáš Kyjánek. Morphological Resources of Derivational Word-Formation Relations. ÚFAL Technical Report TR-2018-61. Prague, 2018.
-
Mateusz Lango et al. Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish). In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, 2018, pp. 1853-1860.
-
Jonáš Vidra. Morphological segmentation of Czech words. Master's thesis, supervised by Zdeněk Žabokrtský. Prague, 2018. Unpublished thesis.
-
Magda Ševčíková et al. A language resource specialized in Czech word-formation: Recent achievements in developing the DeriNet database. Presented at the SlaviCorp 2018 conference. Prague, 2018.
-
Jonáš Vidra & Zdeněk Žabokrtský. Online Software Components for Accessing Derivational Networks. In Proceedings of the Workshop on Resources and Tools for Derivational Morphology (DeriMo 2017). Milano, 2017, pp. 129-139.
-
Magda Ševčíková et al. Identification of aspectual pairs of verbs derived by suffixation in the lexical database DeriNet. In Proceedings of the Workshop on Resources and Tools for Derivational Morphology (DeriMo 2017). Milano, 2017, pp. 105-116.
-
Magda Ševčíková. Modelování slovotvorných vztahů ve slovní zásobě češtiny. Talk at the Seminar of Formal Linguistics, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, May 2017.
-
Magda Ševčíková et al. Lexikální síť DeriNet: elektronický zdroj pro výzkum derivace v češtině. Časopis pro moderní filologii, 98:1, 2016, pp. 62-76.
-
Zdeněk Žabokrtský et al. Merging Data Resources for Inflectional and Derivational Morphology in Czech. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). Portorož, 2016, pp. 1307-1314
-
Magda Ševčíková & Zdeněk Žabokrtský. Word-Formation Network for Czech. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavík, 2014, pp. 1087-1093
-
Magda Ševčíková & Zdeněk Žabokrtský. Talk at the Seminar of Formal Linguistics, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, December 2014 (synchronized with DeriNet 0.9):