Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives

The aim of the project is to combine empirical linguistics and NLP, two approaches based on large authentic textual material in language corpora. The project brings together teams working on usage-based approaches in cognitive linguistics, corpus and quantitative linguistics and computational linguistics and NLP. A key prerequisite for the implementation of the project is an existing infrastructure enabling the research of large data; in this respect, the project will benefit from CNC and LINDAT-CLARIAH, two language-oriented infrastructures that are among the world leaders inthe field of language resources production. The ambition of the project is to cover a wide range of languages and linguistic topics that can be analysed on the basis of existing language resources (incl. contrastive approaches based on parallel corpora).
Furthermore, it will extend the range of language resources and empirical linguistic expertise to languages and areas that are not yet covered and which promise excellent results (e.g. research on aphasia, school communication, language acquisition, public discourse or spontaneous interaction).

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form