
Area of research Funding provider


Duration Provider
EDU-AI: AI asistent pro žáky a učitele 04/2021-12/2023 TAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
Dialogue systems focused on combining tasks and chit-chat 2021-2023 GAUK
ECSS: Evaluation of conversational speech synthesis 2022-2024 GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Low resource methods for dialogue systems applications 2020 - 2022 GAUK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
NaMuDDiS: Natural multi-domain dialogue systems 2019-2021 UK
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR
THEaiTRE: THEAITRE: Umělá inteligence autorem divadelní hry? April 2020 - September 2022 TAČR
Duration Provider
EDU-AI: AI asistent pro žáky a učitele 04/2021-12/2023 TAČR
AIAI: AI: Authorship and Interpretation 2025-2027 GAČR
Arithmetic Properties in the space of Language Model Prompts 2023 GAUK
Controllable NLG: Controllable Natural Language Generation 2021-2023 GAUK
Dialogue systems focused on combining tasks and chit-chat 2021-2023 GAUK
Domain Adaptation for Natural Language Generation 2020-2022 GAUK
ECSS: Evaluation of conversational speech synthesis 2022-2024 GAUK
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR
NaMuDDiS: Natural multi-domain dialogue systems 2019-2021 UK
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature 2023-2024 ETF UK
THEaiTRE: THEAITRE: Umělá inteligence autorem divadelní hry? April 2020 - September 2022 TAČR
AIvK Exponát Didaktikon: Život s umělou inteligencí: upgrade 2023-09-01 - 2023-12-31 UK
Information Retrieval
Duration Provider
EDU-AI: AI asistent pro žáky a učitele 04/2021-12/2023 TAČR
CEDMO 2.0 NPO 1.9. 2024 - 30. 4. 2026 MPO
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR
Duration Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START
PONK: Asistent přístupné úřední komunikace 9/2023-12/2025 TAČR
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
HVar: Disagreement in corpus annotation and variation of human understanding of text 2024-2026 GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR
ELG: European Language Grid 2019-2021 H2020
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications 2023-2025 GAČR
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR
Independent component analysis of continuous word representations 2021–2022 GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power 2017–2019 MŠMT - OP VVV
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries 2023-2027 NAKI
LAPPS-CLARIN: Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services 2016-2018, 2019-2021 Mellon Foundation (USA)
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR
Duration Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages 2024-2026 GAUK
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI
CEDMO 2.0 NPO 1.9. 2024 - 30. 4. 2026 MPO
UNCE VITRI: Center for the Transdisciplinary Research of Violence, Trauma and Justice 2018-2023 UK
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
HVar: Disagreement in corpus annotation and variation of human understanding of text 2024-2026 GAČR
Domain Adaptation for Natural Language Generation 2020-2022 GAUK
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR
ELG: European Language Grid 2019-2021 H2020
ECSS: Evaluation of conversational speech synthesis 2022-2024 GAUK
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR
HPLT: High Performance Language Technologies 2022-2025 HE
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power 2017–2019 MŠMT - OP VVV
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
Named Entity Linking 2020-2022 GAUK
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries 2023-2027 NAKI
EdUKate: Promoting digital education of foreign-language children through machine translation 2023-2026 TAČR
Mashcima: Synthetic training data generation and other methods for handwritten music recognition 2023-2025 GAUK
LAPPS-CLARIN: Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services 2016-2018, 2019-2021 Mellon Foundation (USA)
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR
Word-formation structure of Czech words: a data-based research 2019-2021 GAČR
Duration Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Modeling Mopheme Flow among Languages Jan 2024- Dec 2026 GAUK
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
NomVallex II.: Valency of Non-verbal Predicates. An Extension of Valency Studies to Adjectives and Deadjectival Nouns. 2019-2021 GAČR
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR
Duration Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
Compound Identification and Splitting in Four Languages: A Deep Learning Approach 2022-2024 GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
Modeling Mopheme Flow among Languages Jan 2024- Dec 2026 GAUK
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data 2023-2025 GAUK
Word-formation structure of Czech words: a data-based research 2019-2021 GAČR
Duration Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START
Babel Octopus: Robust Multi-Source Speech Translation 2021-2023 START
Better Tokenization for Multilingual Language Models and Machine Translation 3 years GAČR
Compound Identification and Splitting in Four Languages: A Deep Learning Approach 2022-2024 GAUK
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
CELL: Contextual Machine Learning of Language Translations 2020-2022 CELSA
ELG: European Language Grid 2019-2021 H2020
Exploring Multilingual Representations of Language Units in Neural Networks 2021 - 2023 GAUK
HPLT: High Performance Language Technologies 2022-2025 HE
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations 2023-2026 UK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
Mnohojazyčný strojový překlad 2018-2020 GAČR
Modeling Mopheme Flow among Languages Jan 2024- Dec 2026 GAUK
LangTech: Modernizace oboru Matematická lingvistika MŠMT - OP VVV
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data 2023-2025 GAUK
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
Named Entity Linking 2020-2022 GAUK
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling) 2019-2023 GAČR
EdUKate: Promoting digital education of foreign-language children through machine translation 2023-2026 TAČR
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
Duration Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages 2024-2026 GAUK
UNCE VITRI: Center for the Transdisciplinary Research of Violence, Trauma and Justice 2018-2023 UK
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
HVar: Disagreement in corpus annotation and variation of human understanding of text 2024-2026 GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR
ELG: European Language Grid 2019-2021 H2020
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications 2023-2025 GAČR
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR
Independent component analysis of continuous word representations 2021–2022 GAUK
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
Using Auxiliary Subtasks for Learning Constraints in NLP 2023-2025 GAUK
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR
Duration Provider
ATRIUM: Advancing FronTier Research In the Arts and hUManities 2024 - 2027 HE
HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost 1. 3. 2025 - 31. 12. 2028 MŠMT - OP JAK
RES-Q Plus: Comprehensive solutions of healthcare improvement based on the global Registry of Stroke Care Quality 2022-2026 HE
ELE 2: European Language Equality 2 2022-2023 PPPA (EU)
EVERSE: European Virtual Institute for Research Software Excellence 2024-2027 HE
HumanE-AI-Net: HumanE AI Network 1. 9. 2020 - 31. 8. 2024 H2020
Identification and Prevention of Unwanted Gender Bias in Neural Language Models 2023-2024 GAČR
Improving stomach examinations with Artificial Intelligence: A deep learning approach for assisted gastroscopy 1. 7. 2024 - 31. 12. 2026 MŠMT
InCroMin: Interactive Crosslingual Minutes 2024 HE
Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím 1. 1. 2025 - 31. 12. 2028 MŠMT - OP JAK
Methods for improving neural machine translation of diverse texts 2023-2025 GAUK
OpenEuroLLM: Open European Family of Large Language Models 36 months Digital Europe Programme
Prameny Krkonoš: Prameny Krkonoš. Vývoj systému evidence, zpracování a prezentace pramenů k historii a kultuře Krkonoš a jeho využití ve výzkumu a edukaci 2020-2022 NAKI
PROGRES Q48 - Informatika: Programy progres 2017-2021 UK
PROGRES Q18 - Společenské vědy: Programy progres 2017-2021 UK
SSHOC: Social Sciences & Humanities Open Cloud 2019-30/04/2022 H2020
MEMORISE: Virtualisation and Multimodal Exploration of Heritage on Nazi Persecution 2022-2026 HE
Machine Learning
Duration Provider
Arithmetic Properties in the space of Language Model Prompts 2023 GAUK
PONK: Asistent přístupné úřední komunikace 9/2023-12/2025 TAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
Compound Identification and Splitting in Four Languages: A Deep Learning Approach 2022-2024 GAUK
CELL: Contextual Machine Learning of Language Translations 2020-2022 CELSA
Dialogue systems focused on combining tasks and chit-chat 2021-2023 GAUK
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
Domain Adaptation for Natural Language Generation 2020-2022 GAUK
ECSS: Evaluation of conversational speech synthesis 2022-2024 GAUK
Exploring Multilingual Representations of Language Units in Neural Networks 2021 - 2023 GAUK
HPLT: High Performance Language Technologies 2022-2025 HE
Independent component analysis of continuous word representations 2021–2022 GAUK
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations 2023-2026 UK
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
Low resource methods for dialogue systems applications 2020 - 2022 GAUK
Mnohojazyčný strojový překlad 2018-2020 GAČR
LangTech: Modernizace oboru Matematická lingvistika MŠMT - OP VVV
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR
Multimodal Optical Music Recognition using Deep Learning 2017-2019 GAUK
Named Entity Linking 2020-2022 GAUK
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling) 2019-2023 GAČR
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries 2023-2027 NAKI
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR
Mashcima: Synthetic training data generation and other methods for handwritten music recognition 2023-2025 GAUK
THEaiTRE: THEAITRE: Umělá inteligence autorem divadelní hry? April 2020 - September 2022 TAČR
Using Auxiliary Subtasks for Learning Constraints in NLP 2023-2025 GAUK
Duration Provider
PONK: Asistent přístupné úřední komunikace 9/2023-12/2025 TAČR
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR
ELG: European Language Grid 2019-2021 H2020
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR
HPLT: High Performance Language Technologies 2022-2025 HE
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power 2017–2019 MŠMT - OP VVV
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data 2023-2025 GAUK
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK
LAPPS-CLARIN: Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services 2016-2018, 2019-2021 Mellon Foundation (USA)
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
NomVallex II.: Valency of Non-verbal Predicates. An Extension of Valency Studies to Adjectives and Deadjectival Nouns. 2019-2021 GAČR
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR
Duration Provider
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI
UNCE VITRI: Center for the Transdisciplinary Research of Violence, Trauma and Justice 2018-2023 UK
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
Low resource methods for dialogue systems applications 2020 - 2022 GAUK
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK
NaMuDDiS: Natural multi-domain dialogue systems 2019-2021 UK
Duration Provider
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
ELG: European Language Grid 2019-2021 H2020
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
Duration Provider
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR
HPLT: High Performance Language Technologies 2022-2025 HE
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR
Duration Provider
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
Compound Identification and Splitting in Four Languages: A Deep Learning Approach 2022-2024 GAUK
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
ELG: European Language Grid 2019-2021 H2020
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power 2017–2019 MŠMT - OP VVV
Mashcima: Synthetic training data generation and other methods for handwritten music recognition 2023-2025 GAUK
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature 2023-2024 ETF UK
THEaiTRE: THEAITRE: Umělá inteligence autorem divadelní hry? April 2020 - September 2022 TAČR
LAPPS-CLARIN: Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services 2016-2018, 2019-2021 Mellon Foundation (USA)
AIvK Exponát Didaktikon: Život s umělou inteligencí: upgrade 2023-09-01 - 2023-12-31 UK
Machine Translation
Duration Provider
Babel Octopus: Robust Multi-Source Speech Translation 2021-2023 START
Better Tokenization for Multilingual Language Models and Machine Translation 3 years GAČR
Bergamot: Browser-based Multilingual Translation 2019-2021 H2020
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CELL: Contextual Machine Learning of Language Translations 2020-2022 CELSA
ELG: European Language Grid 2019-2021 H2020
ELITR: European Live Translator 2019-2021 H2020
HPLT: High Performance Language Technologies 2022-2025 HE
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
Machine Translation of Interpreted Speech 2020-2022 GAUK
Mnohojazyčný strojový překlad 2018-2020 GAČR
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
EdUKate: Promoting digital education of foreign-language children through machine translation 2023-2026 TAČR
Research of Methods of Neural Machine Translation Evaluation 2018-2020 GAUK
Using Auxiliary Subtasks for Learning Constraints in NLP 2023-2025 GAUK
Utilising Linguistic Knowledge in Neural Machine Translation 2018 - 2020 GAUK
Speech Recognition
Duration Provider
Babel Octopus: Robust Multi-Source Speech Translation 2021-2023 START
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
ELG: European Language Grid 2019-2021 H2020
ELITR: European Live Translator 2019-2021 H2020
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Machine Translation of Interpreted Speech 2020-2022 GAUK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
Information Structure
Duration Provider
CEDMO 2.0 NPO 1.9. 2024 - 30. 4. 2026 MPO
Exploring Multilingual Representations of Language Units in Neural Networks 2021 - 2023 GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK
Duration Provider
CEDMO 2.0 NPO 1.9. 2024 - 30. 4. 2026 MPO
CEMI: Center for large-scale multi-modal data interpretation 2012 - 2019 GAČR
UNCE VITRI: Center for the Transdisciplinary Research of Violence, Trauma and Justice 2018-2023 UK
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CELL: Contextual Machine Learning of Language Translations 2020-2022 CELSA
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations 2023-2026 UK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Machine Translation of Interpreted Speech 2020-2022 GAUK
Multimodal Optical Music Recognition using Deep Learning 2017-2019 GAUK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling) 2019-2023 GAČR
EdUKate: Promoting digital education of foreign-language children through machine translation 2023-2026 TAČR
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR
Duration Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Using Auxiliary Subtasks for Learning Constraints in NLP 2023-2025 GAUK
Linked data
Duration Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada
ELG: European Language Grid 2019-2021 H2020
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
Duration Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Duration Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
ELG: European Language Grid 2019-2021 H2020
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
Named Entity Linking 2020-2022 GAUK
Duration Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
NomVallex II.: Valency of Non-verbal Predicates. An Extension of Valency Studies to Adjectives and Deadjectival Nouns. 2019-2021 GAČR
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR
Duration Provider
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020
LCT: European Masters Program Language and Communication Technologies IX.2007-VIII.2013, IX.2013-VIII.2019, IX.2019-VIII.2025 EU ERASMUS MUNDUS
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
LangTech: Modernizace oboru Matematická lingvistika MŠMT - OP VVV
NaMuDDiS: Natural multi-domain dialogue systems 2019-2021 UK
AIvK Exponát Didaktikon: Život s umělou inteligencí: upgrade 2023-09-01 - 2023-12-31 UK
Duration Provider
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR
ELG: European Language Grid 2019-2021 H2020
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications 2023-2025 GAČR
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR
Duration Provider
HVar: Disagreement in corpus annotation and variation of human understanding of text 2024-2026 GAČR
Multiword Expressions
Duration Provider
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT
Speech Retrieval
Duration Provider
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Duration Provider
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury
Provider: Digital Europe Programme
Duration Provider Grant ID PI Area
OpenEuroLLM: Open European Family of Large Language Models 36 months Digital Europe Programme 101195233 Jan Hajič
Provider: HE
Duration Provider Grant ID PI Area
EVERSE: European Virtual Institute for Research Software Excellence 2024-2027 HE 101129744 Pavel Straňák
ATRIUM: Advancing FronTier Research In the Arts and hUManities 2024 - 2027 HE 101132163 Pavel Straňák
InCroMin: Interactive Crosslingual Minutes 2024 HE 101070631 Ondřej Bojar
RES-Q Plus: Comprehensive solutions of healthcare improvement based on the global Registry of Stroke Care Quality 2022-2026 HE 101057603 Pavel Pecina
MEMORISE: Virtualisation and Multimodal Exploration of Heritage on Nazi Persecution 2022-2026 HE 101061016 Pavel Pecina
HPLT: High Performance Language Technologies 2022-2025 HE 101070350 Jan Hajič Corpora, Data, Machine Learning, Machine Translation, Monolingual, Multilingual
Provider: Social Sciences and Humanities Research Council of Canada
Duration Provider Grant ID PI Area
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada 895-2023-1002 Jan Hajič jr. Corpora, Data, Information Retrieval, Linked data, Machine Learning, Multi-modality, Tools
Provider: ETF UK
Duration Provider Grant ID PI Area
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature 2023-2024 ETF UK 247002 Rudolf Rosa Tools
Provider: Horizon Europe, ERC
Duration Provider Grant ID PI Area
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC 101039303 Ondřej Dušek Dialog, Linked data, Machine Learning, Semantics
Provider: PPPA (EU)
Duration Provider Grant ID PI Area
ELE 2: European Language Equality 2 2022-2023 PPPA (EU) LC-01884166 (Project 101075356) Jan Hajič
Provider: CELSA
Duration Provider Grant ID PI Area
CELL: Contextual Machine Learning of Language Translations 2020-2022 CELSA CELSA/19/018 Pavel Pecina Machine Learning, Machine Translation, Multi-modality, Multilingual

MŠMT - velké infrastruktury

Duration Provider Grant ID PI Area
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury LM2015071 Jan Hajič Annotations, Coreference, Corpora, Data, Dialog, Discourse, Lexicons, Linked data, Machine Learning, Machine Translation, Morphology, Multi-modality, Parsers, Publications, Semantics, Speech Recognition, Taggers, Tools, Valency
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury LM2023062 Jan Hajič Annotations, Coreference, Corpora, Data, Dialog, Discourse, Information Structure, Lexicons, Linked data, Machine Learning, Machine Translation, Monolingual, Morphology, Multi-modality, Multilingual, Multiword Expressions, Parsers, Publications, Semantics, Speech Recognition, Speech Retrieval, Spellcheckers, Syntax, Taggers, Tools, Valency
Provider: MPO
Duration Provider Grant ID PI Area
CEDMO 2.0 NPO 1.9. 2024 - 30. 4. 2026 MPO MPO 60273/24/21300/21000 Ondřej Bojar Data, Information Retrieval, Information Structure, Multi-modality
Provider: MŠMT - OP JAK
Duration Provider Grant ID PI Area
HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost 1. 3. 2025 - 31. 12. 2028 MŠMT - OP JAK CZ.02.01.01/00/23_025/0008691 Barbora Vidová Hladká
Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím 1. 1. 2025 - 31. 12. 2028 MŠMT - OP JAK CZ.02.01.01/00/23_020/0008518 Jan Hajič

Institutional support for research at the Charles University

Duration Provider Grant ID PI Area
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK UNCE/24/SSH/009 Zdeněk Žabokrtský Annotations, Corpora, Data, Discourse, Information Structure, Multilingual
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations 2023-2026 UK PRIMUS/23/SCI/023 Jindřich Libovický Machine Learning, Multi-modality, Multilingual
AIvK Exponát Didaktikon: Život s umělou inteligencí: upgrade 2023-09-01 - 2023-12-31 UK Unknown Rudolf Rosa Teaching, Tools
NaMuDDiS: Natural multi-domain dialogue systems 2019-2021 UK PRIMUS 19/SCI/10 Ondřej Dušek Dialog, Discourse, Teaching
UNCE VITRI: Center for the Transdisciplinary Research of Violence, Trauma and Justice 2018-2023 UK UNCE/HUM/009 Faculty of Social Sciences (Charles University) Data, Discourse, Multi-modality, Semantics
PROGRES Q48 - Informatika: Programy progres 2017-2021 UK Q48 Jan Hajič
PROGRES Q18 - Společenské vědy: Programy progres 2017-2021 UK Q18 Jan Hajič

Horizon 2020 - European Commission

Duration Provider Grant ID PI Area
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020 101004984 Silvie Cinková Annotations, Corpora, Data, Multilingual, Parsers, Semantics, Taggers, Teaching, Tools
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020 870930 Pavel Pecina Annotations, Data, Dialog, Linked data, Machine Translation, Multi-modality, Multilingual, Parsers, Semantics, Speech Recognition
SSHOC: Social Sciences & Humanities Open Cloud 2019-30/04/2022 H2020 823782 Jan Hajič
ELITR: European Live Translator 2019-2021 H2020 825460 Ondřej Bojar Machine Translation, Speech Recognition
Bergamot: Browser-based Multilingual Translation 2019-2021 H2020 825303 Ondřej Bojar Machine Translation
ELG: European Language Grid 2019-2021 H2020 825627 Jan Hajič Annotations, Corpora, Data, Linked data, Machine Translation, Multilingual, Parsers, Semantics, Speech Recognition, Syntax, Taggers, Tools
HumanE-AI-Net: HumanE AI Network 1. 9. 2020 - 31. 8. 2024 H2020 952026 Jan Hajič


Duration Provider Grant ID PI Area
LCT: European Masters Program Language and Communication Technologies IX.2007-VIII.2013, IX.2013-VIII.2019, IX.2019-VIII.2025 EU ERASMUS MUNDUS 610622-EPP-1-2019-1-DE-EPPKA1-JMD-MOB Vladislav Kuboň Teaching

Mellon Foundation (USA)

Duration Provider Grant ID PI Area
LAPPS-CLARIN: Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services 2016-2018, 2019-2021 Mellon Foundation (USA) G-1901-06505 Jan Hajič Annotations, Corpora, Data, Tools


Duration Provider Grant ID PI Area
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power 2017–2019 MŠMT - OP VVV CZ.02.1.01/0.0/0.0/16_013/0001781 Jan Hajič Annotations, Corpora, Data, Tools
LangTech: Modernizace oboru Matematická lingvistika MŠMT - OP VVV CZ.02.2.69/0.0/0.0/16_018/0002373 Zdeněk Žabokrtský Machine Learning, Multilingual, Teaching

Technology Agency (Czech Republic)

Duration Provider Grant ID PI Area
THEaiTRE: THEAITRE: Umělá inteligence autorem divadelní hry? April 2020 - September 2022 TAČR TL03000348 Rudolf Rosa Dialog, Machine Learning, Tools
PONK: Asistent přístupné úřední komunikace 9/2023-12/2025 TAČR TQ01000526 Barbora Vidová Hladká Annotations, Corpora, Machine Learning
EdUKate: Promoting digital education of foreign-language children through machine translation 2023-2026 TAČR TQ01000458 Lucie Poláková Data, Machine Translation, Multi-modality, Multilingual
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR FW03010656 Pavel Pecina Information Retrieval, Information Structure, Machine Learning, Machine Translation, Semantics
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR TQ12000040 Martin Popel Dialog, Information Retrieval, Machine Learning, Monolingual, Multi-modality
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR TQ01000153 Rudolf Rosa Annotations, Corpora, Monolingual, Teaching, Tools
EDU-AI: AI asistent pro žáky a učitele 04/2021-12/2023 TAČR TL05000236 Ondřej Dušek Dialog, Information Retrieval

Czech Science Foundation

Duration Provider Grant ID PI Area
Better Tokenization for Multilingual Language Models and Machine Translation 3 years GAČR 25-16242S Jindřich Libovický Machine Translation, Multilingual
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR 25-16716S Veronika Kolářová Annotations, Data, Lexicons, Linked data, Monolingual, Semantics, Syntax, Valency
AIAI: AI: Authorship and Interpretation 2025-2027 GAČR 25-14501L Rudolf Rosa
HVar: Disagreement in corpus annotation and variation of human understanding of text 2024-2026 GAČR 24-11132S Šárka Zikánová Annotations, Data, Psycholinguistics, Semantics
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR 23-05240S Barbora Štěpánková Annotations, Corpora, Data, Lexicons, Semantics
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications 2023-2025 GAČR 23-05238S Marie Mikulová Annotations, Semantics, Syntax
Identification and Prevention of Unwanted Gender Bias in Neural Language Models 2023-2024 GAČR 23-06912S David Mareček
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR 22-03269S Jiří Mírovský Annotations, Corpora, Data, Discourse, Parsers
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR 22-20927S Veronika Kolářová Annotations, Corpora, Lexicons, Monolingual, Syntax, Valency
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR GX20-16819X Jan Hajič Coreference, Machine Learning, Machine Translation, Parsers, Semantics, Syntax, Valency
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR 20-09853S Lucie Poláková Annotations, Corpora, Data, Discourse, Semantics
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling) 2019-2023 GAČR 19-26934X Ondřej Bojar Machine Learning, Multi-modality, Multilingual
Word-formation structure of Czech words: a data-based research 2019-2021 GAČR 19-14534S Magda Ševčíková Data, Morphology
CzeDParse: Automatická analýza diskurzních vztahů v češtině 2019-2021 GAČR 19-03490S Jiří Mírovský Annotations, Corpora, Data, Discourse, Lexicons, Parsers
NomVallex II.: Valency of Non-verbal Predicates. An Extension of Valency Studies to Adjectives and Deadjectival Nouns. 2019-2021 GAČR 19-16633S Veronika Kolářová Corpora, Lexicons, Valency
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts 2019-2021 GAČR 19-19191S Silvie Cinková Annotations, Corpora, Data, Discourse, Information Structure, Monolingual, Semantics, Syntax
LSD: Linguistic Structure Representation in Neural Networks 2018-2020 GAČR 18-02196S David Mareček Machine Learning, Machine Translation, Morphology, Multilingual, Parsers, Syntax, Taggers
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions 2018-2020 GAČR 18-03984S Markéta Lopatková Data, Lexicons, Monolingual, Semantics, Syntax, Valency
Mnohojazyčný strojový překlad 2018-2020 GAČR 18-24210S Ondřej Bojar Machine Learning, Machine Translation, Multilingual
CEMI: Center for large-scale multi-modal data interpretation 2012 - 2019 GAČR GAP103/12/G084 Pavel Pecina Multi-modality

Ministry of Education, Youth and Sport (Czech Republic)

Duration Provider Grant ID PI Area
INTERCOST-Readability: Modelování komplexity českých literárních textů VI 2018 - X 2021 MŠMT LTC18020 Silvie Cinková Annotations, Corpora, Data, Discourse, Information Structure, Semantics, Syntax, Teaching
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT LUAUS23283 Jan Hajič Corpora, Data, Lexicons, Linked data, Multilingual, Multiword Expressions, Semantics, Syntax, Valency
Improving stomach examinations with Artificial Intelligence: A deep learning approach for assisted gastroscopy 1. 7. 2024 - 31. 12. 2026 MŠMT LUABA24136 Pavel Pecina

Ministry of Culture

Duration Provider Grant ID PI Area
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI DH23P03OVV037 Kateřina Rysová Corpora, Data, Discourse, Monolingual, Tools
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries 2023-2027 NAKI DH23P03OVV008 Jan Hajič jr. Annotations, Data, Machine Learning
Prameny Krkonoš: Prameny Krkonoš. Vývoj systému evidence, zpracování a prezentace pramenů k historii a kultuře Krkonoš a jeho využití ve výzkumu a edukaci 2020-2022 NAKI DG20P02OVV010 Petra Hoffmannová

Program START (UK - OP VVV)

Duration Provider Grant ID PI Area
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START START/HUM/010 Annotations, Data, Lexicons, Morphology, Multilingual, Semantics
Babel Octopus: Robust Multi-Source Speech Translation 2021-2023 START START/SCI/089 Peter Polák Machine Translation, Multilingual, Speech Recognition

Grant Agency of the Charles University

Duration Provider Grant ID PI Area
Modeling Mopheme Flow among Languages Jan 2024- Dec 2026 GAUK 101924 Abishek Stephen Lexicons, Morphology, Multilingual
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages 2024-2026 GAUK 104924 Federica Gamba Data, Semantics
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data 2023-2025 GAUK 246723 Hana Hledíková Corpora, Morphology, Multilingual
Methods for improving neural machine translation of diverse texts 2023-2025 GAUK 244523 Josef Jon
Mashcima: Synthetic training data generation and other methods for handwritten music recognition 2023-2025 GAUK 289623 Jiří Mayer Data, Machine Learning, Tools
Using Auxiliary Subtasks for Learning Constraints in NLP 2023-2025 GAUK 272323 Dávid Javorský Coreference, Machine Learning, Machine Translation, Semantics
Arithmetic Properties in the space of Language Model Prompts 2023 GAUK 291923 Machine Learning
ECSS: Evaluation of conversational speech synthesis 2022-2024 GAUK 40222 Ondřej Plátek Data, Dialog, Machine Learning
Compound Identification and Splitting in Four Languages: A Deep Learning Approach 2022-2024 GAUK 128122 Emil Svoboda Machine Learning, Morphology, Multilingual, Tools
Independent component analysis of continuous word representations 2021–2022 GAUK 370721 Tomáš Musil Annotations, Machine Learning, Semantics
Dialogue systems focused on combining tasks and chit-chat 2021-2023 GAUK 373921 Dialog, Machine Learning
Controllable NLG: Controllable Natural Language Generation 2021-2023 GAUK 39221 Sourabrata Mukherjee
Exploring Multilingual Representations of Language Units in Neural Networks 2021 - 2023 GAUK 338521 Information Structure, Machine Learning, Multilingual
Named Entity Linking 2020-2022 GAUK 1280120 Data, Machine Learning, Multilingual, Taggers
Machine Translation of Interpreted Speech 2020-2022 GAUK 398120 Dominik Macháček Machine Translation, Multi-modality, Speech Recognition
Domain Adaptation for Natural Language Generation 2020-2022 GAUK 140320 Zdeněk Kasner Data, Machine Learning
Low resource methods for dialogue systems applications 2020 - 2022 GAUK 302120 Dialog, Discourse, Machine Learning
Research of Methods of Neural Machine Translation Evaluation 2018-2020 GAUK 1140218 Dušan Variš Machine Translation
Utilising Linguistic Knowledge in Neural Machine Translation 2018 - 2020 GAUK 976518 Jindřich Helcl Machine Translation
Multimodal Optical Music Recognition using Deep Learning 2017-2019 GAUK 1444217 Jan Hajič jr. Machine Learning, Multi-modality
National Scientific Foundation
Duration Provider Area
PIRE: Partnership for International Research and Education till 2014 NSF Machine Translation, Semantics, Speech Recognition, Teaching

Horizon 2020 - European Commission

Duration Provider Area
CLARIN-PLUS September 2015 – August 2017 H2020
QT21: Quality Translation 21 II.2015-I.2018 H2020 Data, Lexicons, Linked data, Machine Learning, Machine Translation, Tools
KConnect: Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain 2015-2017 H2020 Information Retrieval, Machine Translation, Semantics
HimL: Health in my Language 2.2015–1.2018 H2020 Data, Lexicons, Machine Translation, Morphology
CRACKER: Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research 1.2015-12.2017 H2020 Data, Machine Translation

FP6: Research - European Commission

Duration Provider Area
EuroMatrix IX.2006-II.2009 FP6 Annotations, Corpora, Machine Translation, Tools, Valency

Grant Agency of the Charles University

Duration Provider Area
Neural machine translation for low-resource languages 2019-2021 GAUK Machine Translation, Monolingual
Developing derivational networks for multiple languages 2019-2021 GAUK Data, Morphology, Multilingual
Vektorová reprezentace textu založená na neuronových sítích 2019 - 2021 GAUK Information Retrieval, Machine Learning, Machine Translation
Universal morphosyntactic annotation of language data 2017-2019 GAUK Annotations, Corpora, Machine Learning, Multilingual, Parsers
DeepSynt: Deep Syntactic Representation across Languages 2017-2018 GAUK Corpora, Data, Multilingual
Open domain dialog management with knowledge graphs 2016-2018 GAUK Data, Dialog, Machine Learning
open-domain SLU: Spoken Language Understanding in open-domain environment 2016-2018 GAUK Dialog, Information Retrieval, Linked data, Machine Learning, Semantics
ANNMT: Utilization of artificial neural networks in machine translation 2016-2018 GAUK Machine Translation
Using Language Knowledge in Scene Text Recognition 2015-2017 GAUK Multi-modality
cross-coref: Cross-lingual approaches to coreference resolution 2015-2017 GAUK Annotations, Coreference, Corpora, Data, Machine Learning, Machine Translation, Multilingual
DiaMine: Information mining from spoken dialogue 2015-2017 GAUK Data, Dialog, Machine Learning, Speech Recognition
Čapek GAUK: An alternative way of getting more annotated linguistic data 2014-2016 GAUK Annotations, Tools
AdaNLG: An adaptive natural language generator 2014-2016 GAUK Dialog, Multilingual, Semantics
croSSSynt: Modelling dependency syntax across languages 2014-2016 GAUK Annotations, Corpora, Data, Multilingual, Parsers
MSDS: Modern Spoken Dialog Systems 2014, 2015, 2016 GAUK Data, Dialog, Machine Learning, Speech Recognition
DepRefSet: Utilizing a Multitude of References in Machine Translation 2013-2015 GAUK Data, Machine Translation
Interactive information retrieval in audiovisual dialogue corpora 2013-2015 GAUK Information Retrieval, Speech Retrieval
Tools and data for Machine Translation between Related Languages 2012-2013 GAUK Corpora, Data, Machine Translation, Tools, Valency
Utilization of coreference in MT: Utilization of coreference in Machine Translation 2011-2013 GAUK Linked data, Machine Translation
Sentence-Level Polarity Detection in a Computer Corpus 2011-2013 GAUK Annotations, Corpora, Data, Lexicons, Tools

Czech Science Foundation

Duration Provider Area
AnaConn: Anaphoricity in Connectives: Lexical Description and Bilingual Corpus Analysis 2017–2019 GAČR Discourse, Lexicons, Multilingual
ForFun: Subcategorization of adverbial meanings based on corpus data 2017-2019 GAČR Annotations, Corpora, Data, Monolingual, Semantics
IRTC: Implicit Relations in Text Coherence 2017-2019 GAČR Annotations, Corpora, Data, Discourse, Psycholinguistics
CzEngClass: Contextually-based synonymy and valency of verbs in a bilingual setting 2017-2019 GAČR Annotations, Corpora, Data, Lexicons, Semantics, Valency
CorefChains: Structure of coreferential chains in parallel language data 2016-2018 GAČR Annotations, Coreference, Corpora, Data
NomVallex: Corpus-based Valency Lexicon of Czech Nouns 2016-2018 GAČR Corpora, Lexicons, Valency
DerInfMorph: An Integrated Approach to Derivational and Inflectional Morphology of Czech 2016-2018 GAČR Data, Monolingual, Morphology
Manyla: Morphologically and Syntactically Annotated Corpora of Many Languages 2015–2017 GAČR Annotations, Corpora, Data, Morphology, Multilingual, Parsers, Taggers
zelligharris: Reviving Zellig S. Harris: More linguistic information for distributional lexical analysis of English and Czech 2015-2017 GAČR Annotations, Corpora, Data, Semantics, Taggers
On Linguistic Structure of Evaluative Meaning in Czech 2015-2017 GAČR Annotations, Corpora, Data, Lexicons, Semantics
Combining Words: Syntactic Properties of Czech Multiword Expressions with Light Verbs 2015-2017 GAČR Annotations, Data, Lexicons, Multiword Expressions, Valency
LiStr: Sentence structure induction without annotated corpora 2014 - 2016 GAČR Machine Learning, Multilingual, Parsers
CzEngVallex: A comparison of Czech and English verbal valency based on corpus material (theory and practice) 2013-2015 GAČR Annotations, Corpora, Data, Lexicons
Vybrané derivační vztahy pro automatické zpracovaní češtiny 2012–2014 GAČR Morphology
VALLEX: Delving Deeper: Lexicographic Description of Syntactic and Semantic Properties of Czech Verbs 2012-2015 GAČR Annotations, Data, Lexicons, Semantics, Syntax, Valency
Systematic, economical and corpus-based description of valency properties of Czech deverbal nouns (theory and practice) 2012-2014 GAČR Lexicons, Valency
CorefDisk: Coreference, Discourse Relations and Information Structure in a Contrastive Perspective 2012 - 2015 GAČR Annotations, Coreference, Corpora, Data, Discourse, Information Structure
CZECHMATE: Čeština ve věku strojového překladu 2011 – 2013 GAČR Annotations, Corpora, Data, Machine Translation, Morphology, Parsers
NoSCoM: Non-Standard Computational Models and Their Applications in Complexity, Linguistics, and Learning 2010-2014 GAČR
Komputační lingvistika: Explicitní popis jazyka a anotovaná data se zřetelem na češtinu 2010-2013 GAČR Annotations, Coreference, Corpora, Data, Discourse, Information Structure

OP Praha – Pól růstu ČR

Duration Provider Area
MTviet: Machine Translation from Vietnamese into Czech for the Purposes of the Police of the Czech Republic 2017-2018 Praha OP PPR Machine Translation

Ministry of Culture

Duration Provider Area
ÚSTR: Systém pro trvalé uchování dokumentace a prezentaci historichých pramenů z období totalitních režimů 2016-2019 NAKI
VIADAT: Virtuální asistent pro zpřístupnění historických audiovizuálních dat 2016-2019 NAKI Annotations, Speech Recognition, Tools
AMALACH 2012-2015 NAKI Information Retrieval, Machine Translation, Multi-modality, Speech Recognition, Speech Retrieval, Teaching
EVALD (Evaluator of Discourse): Automatic Evaluation of Text Coherence in Czech 1. 3. 2016 – 31. 12. 2019 NAKI Coreference, Discourse, Information Structure

Ministry of Education, Youth and Sport (Czech Republic)

Duration Provider Area
Multilingual Corpus Annotation as a Support for Language Technologies 2014-2016 MŠMT Annotations, Coreference, Corpora, Data, Discourse
MOBAme: Modern Bayesian methods in machine learning 2013-2013 MŠMT Teaching
VYSTADIAL: Development of statistical methods for spoken dialogue systems 2012-2016 MŠMT Corpora, Dialog, Speech Recognition, Tools
KontaktII: Strojový překlad se sémantickou informací 2012-2014 MŠMT Annotations, Corpora, Data, Lexicons, Machine Translation, Semantics, Valency
LINDAT/Clarin: Establishing and operating the Czech node of pan-European infrastructure for research (Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum) 2010-2015 MŠMT Annotations, Coreference, Corpora, Data, Dialog, Discourse, Lexicons, Linked data, Machine Learning, Machine Translation, Morphology, Multi-modality, Parsers, Publications, Semantics, Speech Recognition, Taggers, Tools, Valency
Kontakt: Towards a Computational Analysis of Text Structure 2010 - 2012 MŠMT Annotations, Coreference, Corpora, Data, Discourse
TextLink-cz: TextLink: Skladba diskurzu v evropských jazycích 1.11.2015 - 31.12.2017 MŠMT Annotations, Corpora, Data, Discourse, Lexicons, Linked data, Monolingual
LD-Parseme: PARSEME: Parsing a víceslovné výrazy – k jazykovědné přesnosti a výpočetní efektivitě ve zpracování přirozeného jazyka 04-2014 – 03-2017 MŠMT Lexicons, Multiword Expressions, Semantics, Valency

FP7: Research - European Commission

Duration Provider Area
TextLink: TextLink: Structuring Discourse in Multilingual Europe 2014 - 2017 FP7 Coreference, Corpora, Discourse, Linked data, Multilingual
QTLeap: Quality Translation by Deep Language Engineering Approaches 2013–2016 FP7 Linked data, Machine Translation
PARSEME: PARSEME: Parsing and Multiword Expressions 2013-2017 FP7 Lexicons, Multiword Expressions, Semantics, Valency
MosesCore 2012-2015 FP7 Data, Machine Translation, Teaching, Tools
EUDAT: EUDAT: European Data Infrastructure 2011–2014 FP7 Data
FAUST: Feedback Analysis for User adaptive Statistical Translation 2010–2013 FP7 Machine Translation
KHRESMOI: Medical information analysis and retrieval 2010-2014 FP7 Information Retrieval, Machine Translation
CLARA: Common Language Resources and their Applications - a Marie Curie ITN 2009-2013 FP7 Annotations, Corpora, Data, Machine Translation, Teaching
EuroMatrixPlus 2009-2012 FP7 Machine Translation

Institutional support for research at the Charles University

Duration Provider Area
PRVOUK: Programy rozvoje vědních oblastí na Univerzitě Karlově - Informatika 2012-2016 UK

Technology Agency (Czech Republic)

Duration Provider Area
INTLIB: Intelligent library 2012-2015 TAČR Data, Linked data, Tools

EU Lifelong Learning Programme

Duration Provider Area
Merlin 2012-2014 LLP Annotations, Corpora, Data


Duration Provider Area
PoliSys: Systém pro analýzu policejních dat pro potřeby Policie ČR 03/2017-03/2018 MVČR Data, Information Retrieval, Machine Learning, Morphology


Duration Provider Area
INSPIRE: INSPIRE in Pocket Inspire Machine Translation