Resources | ÚFAL

Datasets

Petr Zemánek, Adam Pospíšil, Hashem Sellat, Mateusz Krubiński, Pavel Pecina (2024). UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 1. LINDAT/CLARIAH-CZ.
Petr Zemánek, Adam Pospíšil, Hashem Sellat, Mateusz Krubiński, Pavel Pecina (2024). UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 2. LINDAT/CLARIAH-CZ.
Jíří Mayer, Milan Straka, Jan Hajič jr., Pavel Pecina (2024). OLiMPiC 1.0: OpenScore Lieder Linearized MusicXML Piano Corpus. LINDAT/CLARIAH-CZ.
Jíří Mayer, Milan Straka, Jan Hajič jr., Pavel Pecina 2024). GrandStaff-LMX: Linearized MusicXML Encoding of the GrandStaff Dataset. LINDAT/CLARIAH-CZ.
Hashem Sellat et al. (2023) UFAL Parallel Corpus of North Levantine 1.0. LINDAT/CLARIAH-CZ.
Mateusz Krubiński, Pavel Pecina (2023) " MLASK: Multimodal Summarization of Video-based News Articles. LINDAT/CLARIAH-CZ
Shadi Saleh, Pavel Pecina (2019). Extended CLEF eHealth 2013-2015 IR Test Collection. LINDAT/CLARIAH-CZ.
Pavel Pecina, Ondřej Dušek, Jan Hajič, Jindřich Libovický, Zdeňka Urešová (2017). Khresmoi Query Translation Test Data 2.0. LINDAT/CLARIAH-CZ.
Ondřej Dušek et al. (2017). Khresmoi Summary Translation Test Data 2.0. LINDAT/CLARIAH-CZ.
Petra Galuščáková et al. (2017). Czech Malach Cross-lingual Speech Retrieval Test Collection. LINDAT/CLARIAH-CZ.
Ondřej Dušek et al. (2014). Khresmoi Summary Translation Test Data 1.1. LINDAT/CLARIAH-CZ.
Pavel Pecina, Ondřej Dušek, Jan Hajič, Zdeňka Urešová (2013). Khresmoi Query Translation Test Data 1.0. LINDAT/CLARIAH-CZ.
Pavel Pecina et al. (2011). PANACEA English-French and English-Greek parallel corpus acquired for Environment domain. ELRA.
Pavel Pecina et al. (2011). PANACEA English-French and English-Greek parallel corpus acquired for Labour Legislation domain. ELRA.
Pavel Pecina et al. (2011). PANACEA Environment English monolingual corpus. ELRA.
Pavel Pecina et al. (2011). PANACEA Labour English monolingual corpus. ELRA.
Pavel Pecina et al. (2011). PANACEA Environment French monolingual corpus. ELRA.
Pavel Pecina et al. (2011). PANACEA Labour French monolingual corpus. ELRA.
Pavel Pecina et al. (2011). PANACEA Environment Greek monolingual corpus. ELRA.
Pavel Pecina et al. (2011). PANACEA Labour Greek monolingual corpus. ELRA.
Eduard Bejček et al. (2011). Lexico-Semantic Annotation of PDT using Czech WordNet. LINDAT/CLARIAH-CZ.
Pavel Pecina (2008). Gold Standard Reference Data for Multiword Expression Extraction: Czech Dependency Bigrams from the Prague Dependency Treebank. LINDAT/CLARIAH-CZ.

Presentations

Habilitation presentation, Faculty of Mathematics and Physics, Charles University, Prague, 2017.
Malach: zpracování audiovizuálního archívu svědectví přeživších holocaustu, New Media Inspiration, Prague, 2015.
Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation. The 24th International Conference on Computational Linguistics (Coling 2012), Mumbai, India, December 14, 2012.
Lexical Association Measures: Collocation Extraction. Invited talk, LOEWE Digital Humanities, Goethe University, Frankfurt am Main, Germany, Jul 12, 2012.
Cross-Language Speech Retrieval and its Evaluation in the Malach Project. Invited talk, European Masters Program in Language and Communication Technologies workshop, Prague, May 29, 2012.
Lexical Association Measures: Collocation Extraction. Invited talk, Institute of the Czech National Corpus, Prague, Czech Republic, Feb 7, 2012.
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation. The 15th Annual Conference of the European Associtation for Machine Translation (EAMT 2011), Leuven, Belgium, May 31, 2011. (presented by Antonio Toral)
Lexical Association Measures: Collocation Extraction. Invited talk, Knowledge Engineering Group seminar, Prague, Czech Republic, Nov 26, 2009.
Lexical Association Measures: Collocation Extraction. CNGL seminar, Dublin City University, Dublin, Ireland, Sep 21, 2009.
Jak psát a nepsat vědecké články. Winter Seminar, UFAL Horní Mísečky, Czech Republic, Feb 9, 2009.
Lexical Association Measures: Collocation Extraction. MFF UK, Ph.D. thesis defense, Prague, Czech Republic, Sep 24, 2008.
A Machine Learning Approach to Multiword Expression Extraction, Towards a Shared Task for Multiword Expressions Workshop (MWE 2008), LREC 2008, Marrakech, Morocco, Jun 1, 2008.
Reference Data for Czech Collocation Extraction, Towards a Shared Task for Multiword Expressions Workshop (MWE 2008), LREC 2008, Marrakech, Morocco, Jun 1, 2008.
Úklid a čištění jako věda. Mixer, Prague, Oct 21, 2007.
Cross-Language Speech Retrieval and its Evaluation in the Malach Project, UFAL Seminar, Prague, Nov 20, 2006.
Vyhledavání informací v projektu Malach, Mixer, Prague, Apr 12, 2006.
An Extensive Empirical Study of Collocation Extraction Methods, ACL 2005 Student Research Workshop, Ann Arbor, USA, Jun 27, 2005.
Collocation Extraction: The Statistical Approach, Invited talk, Institute of Czech National Corpus, Prague, Apr 12, 2005.
Validating and Improving the Czech WordNet via Lexico-Semantic Annotation of the Prague Dependency Treebank, LREC workshop: Building Lexical Resources from Semantically Annotated Corpora, Lisbon, Portugal, Jun 8, 2004.
Automatic Collocation Extraction from Text Corpora, UFAL Seminar, MFF, Prague, May 17, 2004.

Posters

Domain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study. EAMT, Trento, Italy, 2012. (presented by Antonio Toral)
Combining Association Measures for Collocation Extraction. COLING/ACL, Sydney, Australia, 2006.
Language Modeling for Czech ASR. MALACH, NSF site visit, Washigton, USA, 2004.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Datasets

Presentations

Posters