A data-based approach to competition in word-formation: selected semantic categories across seven languages
The project deals with data-based research into competition in word-formation. It aims to compare word-formation processes and strategies that speakers employ to express the semantic concepts of diminutiveness and femaleness in seven European languages (two Slavic, three Germanic, and two Romance languages). Derivatives, compounds and syntactic phrases used for these concepts in the analysed languages (cf. 'Polizistin' in German, 'policewoman' in English, and 'mujer policía' in Spanish) will be identified either by exploiting available language resources and tools (some of which have been developed by the project team members) or using tools and methods designed specifically for the project. The team of four PhD students of computational linguistics will develop machine learning models that will be able to simulate how these semantic concepts are expressed in the languages studied and discover which linguistic properties influence native speakers' choices among the competing alternatives. The results of the research are expected to be relevant both for the linguistic discussion on competition in word-formation and for modelling word-formation in Natural Language Processing.
Reg. n. CZ.02.2.69/0.0/0.0/19_073/0016935.
Publications
Journal, Proceedings & Reports
-
Kyjánek, L. 2022. Web-based Annotation Interface for Derivational Morphology. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies: System Demonstrations, Seattle, pp. 10-16.
-
Kyjánek L.; Bonami, O. 2022. A Distributional Approach to Inflection and Derivation in Czech. In: Word-Formation Theories VI & Typology and Universals in Word-Formation V, book of abstracts, pp. 21-22.
-
Kyjánek, L.; Lyashevskaya. O.; Nedoluzhko, A.; Vodolazsky, D.; Žabokrtský, Z. 2022. Constructing a Lexical Resource of Russian Derivational Morphology. In: Proceedings of the 13th Language Resources and Evaluation Conference (LREC), Marseille, pp. 2788-2798.
-
Žabokrtský, Z.; Bafna, N.; Bodnár, J.; Kyjánek, L.; Svoboda, E.; Ševčíková, M.; Vidra, J. 2022. Towards Universal Segmentations: UniSegments 1.0. In: Proceedings of the 13th Conference on Language Resources and Evaluation Conference (LREC). Marseille, pp. 1137-1149.
-
Svoboda, E.; Ševčíková, M. 2022. Word Formation Analyzer for Czech: Automatic Parent Retrieval and Classification of Word Formation Processes. The Prague Bulletin of Mathematical Linguistics 118(1), pp. 55-73.
-
Bafna, N.; Bodnár, J.; Kyjánek, L.; Svoboda, E.; Ševčíková, M.; Vidra, J.; Žabokrtský, Z. 2021. Towards Universal Segmentations: Survey of Existing Morphosegmentation Resources. Technical Report TR-2021-69. Prague: Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University. ISSN: 1214-5521.
-
Ševčíková, M.; Kyjánek, L.; Vidová Hladká, B. 2021. Agent noun formation in Czech: An empirical study on suffix rivalry. In: Second Workshop on Paradigmatic Word Formation Modelling, book of abstracts, pp. 65-68.
-
Svoboda, E.; Ševčíková, M. 2021. Spliting and Identifying Czech Compounds: A Pilot Study. In Proceedings of the Third Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021). France, pp. 125-134.
Data & Software
-
Kyjánek, L.; Bonami, O. 2022. Package of word embeddings of Czech from a large corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-4920.
-
Kyjánek, L. 2022. Web-based Annotation Interface for Derivational Morphology. Github: https://github.com/lukyjanek/uder-annotation-interface. Online application: https://lukyjanek.github.io/subpages/uder-annotation-interface/UDerAnnotation.html.
-
Žabokrtský, Z.; Bafna, N.; Bodnár, J.; Kyjánek, L.; Svoboda, E.; Ševčíková, M.; Vidra, J. et al. 2022. Universal Segmentations 1.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-4629.
-
Kyjánek, L.; Lyashevskaya, O.; Nedoluzhko, A.; Vodolazsky, D.; Žabokrtský, Z. 2021. DeriNet.RU 0.5, Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, DeriNetRU-0.5.zip. Released also in the Universal Derivation collection v1.1.
-
Vidra, J.; Žabokrtský, Z.; Kyjánek, L.; Ševčíková, M.; Dohnalová, Š.; Svoboda, E.; Bodnár, J. DeriNet 2.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, 2021, http://hdl.handle.net/11234/1-3765.
-
Kyjánek, L.; Žabokrtský, Z.; Vidra, J.; Ševčíková, M. Universal Derivations v1.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, 2021, http://hdl.handle.net/11234/1-3247.
Presentations & Posters