Principal investigator (ÚFAL): 
Project Manager (ÚFAL): 
Provider: 
Grant id: 
PRIMUS/23/SCI/023
ÚFAL budget: 
9158000
Duration: 
2023-2026

Language Neutral and Culturally Aware Multilingual Neural Sentence Representations

In this project, we study multilingual language models, to what extent are the inner representaions of the models similar across languages and cross-lingual transfer between models.

Some news and achievements

Publications

  1. Adnan Al Ali, Jindřich Libovický (2024): How Gender Interacts with Political Values: A Case Study on Czech BERT Models. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 3038-3045, European Language Resources Association, Torino, Italy, ISBN 978-2-493814-10-4 (local PDF, bibtex)
  2. Katharina Hämmerl, Jindřich Libovický, Alexander Fraser (2024): Understanding Cross-Lingual Alignment—A Survey. In: Findings of the Association for Computational Linguistics: ACL 2024, pp. 10922-10943, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-099-8 (url, local PDF, local PDF, bibtex)
  3. Katharina Hämmerl, Andrei Alexandru Manea, Gianluca Vico, Jindřich Helcl, Jindřich Libovický (2024): CUNI and LMU Submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval. In: Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pp. 357-364, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-184-1 (url, bibtex)
  4. Jindřich Helcl, Zdeněk Kasner, Ondřej Dušek, Tomasz Limisiewicz, Dominik Macháček, Tomáš Musil, Jindřich Libovický (2024): Teaching LLMs at Charles University: Assignments and Activities. In: The Sixth Workshop on Teaching NLP: Proceedings of the Workshop, pp. 69-72, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-134-6 (url, local PDF, local PDF, bibtex)
  5. Jindřich Libovický, Jindřich Helcl (2024): Lexically Grounded Subword Segmentation. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7403-7420, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-164-3 (url, bibtex)
  6. Philipp Rösch, Norbert Oswald, Michaela Geierhos, Jindřich Libovický (2024): Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples. In: The 3rd Workshop on Advances in Language and Vision Research: Proceedings of the Workshop, pp. 102-115, Association for Computational Linguistics (ACL), Kerrville, TX, USA , ISBN 979-8-89176-153-7 (pdf, local PDF, local PDF, bibtex)
  7. Katharina Hämmerl, Björn Dieseroth, Patrick Schramowski, Jindřich Libovický, Constantin A. Rothkopf, Alexander Fraser, Kristian Kersting (2023): Speaking Multiple Languages Affects the Moral Bias of Language Models. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 2137-2156, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (url, bibtex)
  8. Katharina Hämmerl, Alina Fastowski, Jindřich Libovický, Alexander Fraser (2023): Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 7023-7037, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (url, bibtex)
  9. Jindřich Helcl, Jindřich Libovický (2023): CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval. In: Proceedings of the The 2nd Workshop on Multi-lingual Representation Learning (MRL), pp. 302-309, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-056-1 (pdf, local PDF, local PDF, bibtex)
  10. Hynek Kydlíček, Jindřich Libovický (2023): A Dataset and Strong Baselines for Classification of Czech News Texts. In: 26th International Conference, TSD 2023, pp. 33-44, Springer, Cham, Switzerland, ISBN 978-3-031-40497-9 (url, bibtex)
  11. Jindřich Libovický (2023): Is a Prestigious Job the same as a Prestigious Country? A Case Study on Multilingual Sentence Embeddings and European Countries. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1000-1010, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-71-1 (pdf, local PDF, bibtex)