Michal Novák
Main Research Interests
- coreference / anaphora resolution
- machine translation
- machine learning
Projects
Current
- MASAPI - Multilingual assistant for searching, analysing and processing information and decision support
- LINDAT-CLARIAH-CZ - Language Resources and Digital Arts and Humanities Research Infrastructure
- CorefUD - Coreference in Universal Dependencies
Former
- EuroMatrix+
- GAUK 4226/2011 – Utilization of coreference in machine translation
- Khresmoi – Medical information retrieval (working on Machine Translation)
- QTLeap - Quality Translation by Deep Language Engineering Approaches
- GAUK 3389/2015 - Cross-lingual approaches to coreference resolution
- GAČR 16-05394S - Structure of coreferential chains in parallel language data
- NAKI II DG16P02B016 - Automatic Evaluation of Text Coherence in Czech
- Bergamot - Browser-based Multilingual Translation
Curriculum Vitae
- 2018 Ph.D. (Doctoral degree) in Computational Linguistics, Faculty of Mathematics and Physics, Charles University in Prague.
- Thesis: Coreference from the Cross-lingual Perspective
- 2010 Mgr. (Master's degree) in Computational Linguistics, Faculty of Mathematics and Physics, Charles University in Prague.
- Thesis: Machine Learning Approach to Anaphora Resolution
- 2008 Bc. (Bachelor's degree) in Computer Science, Faculty of Mathematics and Physics, Charles University in Prague.
- Thesis: Vizualizace PML souborů
Selected Bibliography
- Google Scholar
- ORCID: 0000-0002-6052-7459
- Scopus ID: 54793288000
- Researcher ID: N-4777-2017
- Findings of the Third Shared Task on Multilingual Coreference Resolution. In: Proceedings of The Seventh Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 78-96, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-171-1 (url, local PDF, bibtex)
- Universal Anaphora: The First Three Years. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 17087-17100, European Language Resources Association, Torino, Italy, ISBN 978-2-493814-10-4 (pdf, local PDF, bibtex)
- Charles Translator: A Machine Translation System between Ukrainian and Czech. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 3038-3045, European Language Resources Association, Torino, Italy, ISBN 978-2-493814-10-4 (pdf, local PDF, bibtex)
- What Can Dictionaries Tell Us About Pragmatic Markers – Building the Lexicon of Epistemic and Evidential Markers in Czech. In: Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress., pp. 728-741, Institut za hrvatski jezik, Zagreb, Croatia, ISBN 978-953-7967-77-2 (pdf, bibtex)
- Negative Lexical Constraints in Neural Machine Translation. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 372-384, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (pdf, bibtex)
- The Universal Anaphora Scorer 2.0. In: Proceedings of the 15th International Conference on Computational Semantics, pp. 183-194, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-74-6 (url, bibtex)
- Findings of the Second Shared Task on Multilingual Coreference Resolution. In: Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution, pp. 1-18, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-02-5 (pdf, local PDF, bibtex)
- Findings of the 2022 Conference on Machine Translation (WMT22). In: Proceedings of the Seventh Conference on Machine Translation, pp. 1-34, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
- Český překladač se naučil ukrajinsky rychle. Jen někdy plete jména měst. In: Seznam Zprávy, pp. 1-2 (url, bibtex)
- CorefUD 1.0: Coreference Meets Universal Dependencies. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 4859-4872, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, bibtex)
- Findings of the Shared Task on Multilingual Coreference Resolution. In: Proceedings of the CRAC 2022 Shared Task on Multilingual Coreference Resolution, pp. 1-17, Association for Computational Linguistics, Gyeongju, Korea (url, local PDF, local PDF, bibtex)
- CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 354-361, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
- CUNI systems for WMT21: Terminology translation Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 828-834, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
- Is one head enough? Mention heads in coreference annotations compared with UD-style heads. In: Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021), pp. 101-114, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-14-8 (pdf, local PDF, bibtex)
- Coreference meets Universal Dependencies – a pilot experiment on harmonizing coreference datasets for 11 languages (technical report). In: (pdf, local PDF, bibtex)
- Do UD Trees Match Mention Spans in Coreference Annotations?. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 3570-3576, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-10-0 (url, local PDF, bibtex)
- Backtranslation Feedback Improves User Confidence in MT, Not Quality. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 151-161, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-46-6 (url, local PDF, bibtex)
- Extending Ptakopět for Machine Translation User Interaction Experiments. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 115, pp. 129-142 (pdf, local PDF, bibtex)
- Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech. In: Proceedings of the 22nd International Conference on Text, Speech and Dialogue - TSD 2019, Lecture Notes in Computer Science, ISSN 0302-9743, 11697, pp. 197-210, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-030-27946-2 (url, bibtex)
- EVALD – a Pioneer Application for Automated Essay Scoring in Czech. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 113, pp. 9-30 (url, local PDF, bibtex)
- Coherence Errors in Learners’ Essays and a Possibility of Their Improvement through EVALD (Automated Evaluator of Discourse). In: Proceedings of the 11th Annual International Conference on Education and New Learning Technologies (EDULEARN 2019), pp. 6761-6768, IATED Academy, Palma, Spain, ISBN 978-84-09-12031-4 (url, bibtex)
- SAO WMT19 Test Suite: Machine Translation of Audit Reports. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 680-692, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)
- Analysis of coreferential expressions in PAWS. In: Computational Linguistics and Intellectual Technologies, ISSN 2221-7932, vol. 2018, no. 17, 2018, pp. 512-521 (pdf, bibtex)
- PAWS: A Multi-lingual Parallel Treebank with Anaphoric Relations. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 68-76, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-13-1 (url, bibtex)
- Coreference from the Cross-lingual Perspective. In: , ISBN 978-80-88132-06-6 (bibtex)
- A Study on Bilingually Informed Coreference Resolution. In: Proceedings of the 18th conference ITAT 2018: Slovenskočeský NLP workshop (SloNLP 2018), pp. 130-137, CreateSpace Independent Publishing Platform, Košice, Slovakia, ISBN 978-1727267198 (pdf, bibtex)
- Coreference from the Cross-lingual Perspective (PhD thesis). In: (url, local PDF, bibtex)
- A Fine-grained Large-scale Analysis of Coreference Projection. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 77-86, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-13-1 (url, bibtex)
- Topic–Focus Articulation: A Third Pillar of Automatic Evaluation of Text Coherence. In: Advances in Computational Intelligence (LNAI 11289): 17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Proceedings, Part II, pp. 92-105, Springer, Switzerland, ISBN 978-3-030-04497-8 (url, bibtex)
- Practicing Students‘ Writing Skills through eLearning: Automated Evaluation of Text Coherence in Czech. In: EDULEARN18 Proceedings, pp. 1963-1970, IATED Academy, Valencia, Spain, ISBN 978-84-09-02709-5 (url, bibtex)
- Coreference Resolution System Not Only for Czech. In: Proceedings of the 17th conference ITAT 2017: Slovenskočeský NLP workshop (SloNLP 2017), pp. 193-200, CreateSpace Independent Publishing Platform, Praha, Czechia, ISBN 978-1974274741 (pdf, bibtex)
- Projection-based Coreference Resolution Using Deep Syntax. In: Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017), pp. 56-64, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-945626-46-3 (pdf, bibtex)
- Incorporating Coreference to Automatic Evaluation of Coherence in Essays. In: Statistical Language and Speech Processing, pp. 58-69, Springer International Publishing, Cham, Switzerland, ISBN 978-3-319-68455-0 (pdf, local PDF, bibtex)
- Introducing EVALD – Software Applications for Automatic Evaluation of Discourse in Czech. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 634-641, INCOMA Ltd., Šumen, Bulgaria, ISBN 978-954-452-048-9 (pdf, bibtex)
- CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered. In: Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Lecture Notes in Computer Science, ISSN 0302-9743, 9924, pp. 231-238, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-319-45509-9 (url, bibtex)
- Coreference in Prague Czech-English Dependency Treebank. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 169-176, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (url, local PDF, bibtex)
- Possessives in Parallel English‑Czech-Russian Texts. In: Computational Linguistics and Intellectual Technologies, ISSN 2221-7932, 15, pp. 483-497 (pdf, local PDF, bibtex)
- Pronoun Prediction with Linguistic Features and Example Weighing. In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 602-608, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, bibtex)
- Dictionary-based Domain Adaptation of MT Systems without Retraining. In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 449-455, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, bibtex)
- New Language Pairs in TectoMT. In: Proceedings of the 10th Workshop on Machine Translation, pp. 98-104, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-941643-32-7 (pdf, local PDF, bibtex)
- Coreference chains in Czech, English and Russian: Preliminary findings. In: Computational Linguistics and Intellectual Technologies, ISSN 2221-7932, vol. 14, no. 21, pp. 474-486 (pdf, bibtex)
- Correspondences between Czech and English Coreferential Expressions. In: Discours: Revue de linguistique, psycholinguistique et informatique., ISSN 1963-1723, 16, pp. 1-41 (url, bibtex)
- Comparison of Coreference Resolvers for Deep Syntax Translation. In: Proceedings of the Second Workshop on Discourse in Machine Translation, pp. 17-23, Association for Computational Linguistics, Lisboa, Portugal, ISBN 978-1-941643-32-7 (url, bibtex)
- Translation Model Interpolation for Domain Adaptation in TectoMT. In: Proceedings of the 1st Deep Machine Translation Workshop, pp. 89-96, ÚFAL MFF UK, Praha, Czechia, ISBN 978-80-904571-7-1 (url, local PDF, local PDF, bibtex)
- Machine Translation of Medical Texts in the Khresmoi Project. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 221-228, Association for Computational Linguistics, Baltimore, MD, USA, ISBN 978-1-941643-17-4 (pdf, local PDF, local PDF, bibtex)
- Cross-lingual Coreference Resolution of Pronouns. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 14-24, Dublin City University and Association for Computational Linguistics, Dublin, Ireland, ISBN 978-1-941643-26-6 (pdf, bibtex)
- Adaptation of machine translation for multilingual information retrieval in medical domain. In: Artificial Intelligence in Medicine, ISSN 0933-3657, vol. 61, no. 3, pp. 165-185 (url, bibtex)
- Khresmoi Professional: Multilingual Semantic Search for Medical Professionals. In: Proceedings of the ACM SIGIR Workshop on Health Search and Discovery: Helping Users and Advancing Medicine, pp. 31-34, Microsoft Research, Cambridge, UK (url, local PDF, bibtex)
- A Coreferentially annotated Corpus and Anaphora Resolution for Czech. In: Computational Linguistics and Intellectual Technologies, pp. 467-475, ABBYY, Moskva, Russia, ISBN 978-1-937284-58-9 (local PDF, bibtex)
- Translation of "It" in a Deep Syntax Framework. In: 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Workshop on Discourse in Machine Translation, pp. 51-59, Omnipress, Inc., Sofija, Bulgaria, ISBN 978-1-937284-68-8 (pdf, bibtex)
- Two Case Studies on Translating Pronouns in a Deep Syntax Framework. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, pp. 1037-1041, Asian Federation of Natural Language Processing, Nagoya, Japan, ISBN 978-4-9907348-0-0 (pdf, bibtex)
- The Joy of Parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3921-3928, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (url, local PDF, bibtex)
- Formemes in English-Czech Deep Syntactic MT. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 267-274, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (pdf, local PDF, bibtex)
- Using Czech-English Parallel Corpora in Automatic Identification of It. In: The Fifth Workshop on Building and Using Comparable Corpora, pp. 112-120, European Language Resources Association, İstanbul, Turkey (local PDF, bibtex)
- Coreference Resolution in the Prague Dependency Treebank (technical report). In: , pp. 1-66 (pdf, bibtex)
- Utilization of Anaphora in Machine Translation. In: WDS'11 Proceedings of Contributed Papers, Part I, pp. 155-160, Matfyzpress, Praha, Czechia, ISBN 978-80-7378-184-2 (pdf, bibtex)
- Resolving Noun Phrase Coreference in Czech. In: Lecture Notes in Computer Science, ISSN 0302-9743, 7099, pp. 24-34 (url, bibtex)
- Machine Learning Approach to Anaphora Resolution (masters thesis). In: (pdf, bibtex)
- Získávání paralelních textů z webu. In: Informačné Technológie – Aplikácie a Teória. Zborník príspevkov, ITAT 2009, pp. 47-54, PONT s.r.o., Seňa, Slovakia, ISBN 978-80-970179-1-0 (local PDF, bibtex)