Tags:

A Complete Guide to Czech Language Parsing

If an application is supposed to understand natural language to some extent, it usually has to (syntactically) parse the input utterances. I.e., it attempts to discover relations between words of the sentence, and the way their meanings combine to form the overall meaning of the sentence. We call the application module responsible for that a parser.

A whole range of syntactic formalisms have been proposed to model the syntax of a natural language. Most parsers heavily rely on treebanks - corpora of written or spoken utterances, in which the word-to-word relations have been annotated manually by linguistically trained annotators. As most treebanks are bound to a particular syntactic formalism, the formalism used by a parser is usually determined by the data available for its training.

Two particular formalisms deserve our special attention: the dependency syntax, as defined in the Prague Dependency Treebank (PDT), and the constituent syntax, as defined in the Penn TreeBank. The former is the most important (and the only for which a treebank is available) formalism for Czech; the latter applies to English and was historically more popular in some parts of the world.

PDT has been annotated in three layers, called morphological, analytical and tectogrammatical. The analytical layer corresponds to the surface, the tectogrammatical to the deep syntax. Thus the analytical representation (AR) is closer to the appearance of the sentence in the text while the tectogrammatical one (TR) is closer to the meaning. PDT 1.0 (released 2001) contained ARs and just a tiny sample of TRs. Considerable amount of TRs appear first in PDT 2.0 (released 2005).

The annotation in ARs consists of two parts: the dependency structure (tree), and the analytical functions (also syntactic tags, s-tags or dependency relation labels). Some parsers concentrate only on the tree structure and do not assign the s-tags. Nevertheless, the s-tag assignment is a rather easy task once the structure has been built. The linguistic description of how the ARs of particular language constructions should look like is given in the Manual for the annotators (Czech version here). The list and description of possible s-tags is given there as well.

An analytical dependency structure is a rooted tree where each node (except of the root) corresponds to one word of the underlying sentence (and for each word there is a corresponding node). The simplest representation of such a tree is a sequence of integer numbers: i-th position in the sequence corresponds to the i-th word of the underlying sentence, and the number in that position is interpreted as the index of the word, on which the i-th word depends. We use the terms dependent, depending node or child for the i-th word, and governor, governing node or parent for the other word.

The standard method of evaluating parser accuracy is computing the percentage of children that got the correct parent index, among all words in a test data set. This is also called the unlabeled attachment score (UAS) to emphasize that labels of the dependency relations are not evaluated. Alternatively we can require that both the parent is identified and the relation is labeled correctly. Then we have the labeled attachment score (LAS). Unless specifically noted otherwise, the term accuracy in this overview means UAS.

Tests on the Prague Dependency Treebank 1.0

PDT 1.0 provides two data sets inteded to evaluate analytical parsers, the d-test (development) and the e-test (cross-evaluation). See also the PDT 1.0 Data Layout Table. The d-test consists of 153 files, 7319 non-empty sentences, and 126,030 words. The evaluation on the d-test data is available for most parsers, so for the sake of comparability we stick with that data here.

The following table gives the accuracy figures for various parsers on the PDT 1.0 d-test data. (Note: the development of some of the parsers is going on. We try to maintain here either their published results, or the results we measured ourselves in case we have the parser or its output on d-test data available.)

Author (parser)	Accuracy	Notes
Combination ec+mc+zž+dz	86.3	Zeman & Žabokrtský (2005)
Hall/Novák/Charniak	85.0	Hall & Novák (2005)
Ryan McDonald	84.4	McDonald et al. (2005)
Eugene Charniak	84.3	Charniak (2000) describes the original parser for English. Czech results measured by Zeman on the output provided by Charniak in 2003.
Michael Collins	82.5	Collins et al. (1999) gives results on PDT 0.5. Re-run and re-measured on PDT 1.0 by Zeman.
Joakim Nivre	80.1	Nivre & Nilsson (2005)
Zdeněk Žabokrtský	75.2	Parser run and accuracy measured by Zeman in 2004.
Daniel Zeman	74.7	Zeman (2004a)
Václav Klimeš	74.7	Accuracy reported by Klimeš in 2006; to be published.
Tomáš Holan (r2l)	71.7	Measured by Zeman on parser output provided by Holan in early 2004.
Tomáš Holan (l2r)	69.9	Measured by Zeman on parser output provided by Holan in early 2004.
Tomáš Holan (pshrt)	62.8	Measured by Zeman on parser output provided by Holan in early 2004.

Note that due to version incompatibility, Charniak's parser cannot be re-trained. The Collins' parser was included on the PDT 2.0 CD-ROM.

Tests on the analytical layer of the Prague Dependency Treebank 2.0

PDT 2.0 provides two data sets inteded to evaluate analytical parsers, the d-test (development) and the e-test (cross-evaluation). Each of those sets is split into two parts, one that has tectogrammatical annotation as well (tamw/[de]test/*.a) and one that has not (amw/[de]test/*.a). For analytical parsing, both parts have to be combined. See also the PDT 2.0 Data Description. The training data consists of 68,562 sentences and 1,172,299 tokens. The d-test data consists of 9,270 sentences and 158,962 tokens. The e-test data consists of 10,148 sentences and 173,586 tokens. Do not use this data to test parsers that have been trained on PDT 1.0! Some of the current test data were declared as training data in PDT 1.0!

Extra care must be taken when running parsing experiments or reporting results on PDT 2.0 as to which source of morphological information was used by the parser: undisabiguated, automatically disambiguated (by which tagger?) or manually disambiguated. In fact, the same had to be taken into account when working with PDT 1.0. However, with version 2.0 it is easier to overlook that one is actually working with the wrong source of morphology, because:

The annotation is stand-alone, meaning that morphology resides in a file separate from the actual corpus texts.
Alternative morphological annotation from a different source would be in a file separate from the primary morphological annotation.
Unlike in the CSTS format used in PDT 1.0, the human/machine distinction is not made explicit by using different XML elements in the new PML format.
Most notably, due to space considerations, PDT 2.0 is usually distributed with manually disambigated morphology only. This means that one would have to run a morphological analyzer and a tagger (such as those provided on the PDT 2.0 CD) to obtain machine-disambiguated morphology for the data.

It is strongly recommended that anyone report results of experiments where the parser had not access to any human annotation in the test data, including morphology (of course, use everything you find useful in the training data). The obvious reason is that your parser is unlikely to have such information available in a real-world application.

The following table gives the accuracy figures for various parsers on the PDT 2.0 test data. (Note: the development of some of the parsers is going on. We try to maintain here either their published results, or the results we measured ourselves in case we have the parser or its output on d-test data available.)

Author (parser)	D-test accuracy	E-test accuracy	Notes
Combination rmd+mc+zž+5×th*	86.2	85.8	Holan & Žabokrtský (2006), Simply Weighted Parsers (SWP)
Hall/Nilsson/Nivre	86.0	85.8	Malt Parser 1.7 with `stacklazy` algorithm and Java implementation of LibSVM learner (see Nivre (2009)), run by Zeman in June 2013, using feature definition file provided by the Uppsala team. Automatically disambiguated tags used during both training and parsing.
McDonald/Novák/Žabokrtský		84.7	Feature engineering over McDonald's MST parser. See Novák & Žabokrtský (2007).
Ryan McDonald	84.2	84.0	Same parser as in McDonald et al. (2005), run by Václav Novák in 2006. PDT 2.0 results published in Holan & Žabokrtský (2006). Automatically disambiguated tags used during both training and parsing.
Michael Collins	81.6	80.9	Same parser as in Collins et al. (1999), PDT 2.0 results published in Holan & Žabokrtský (2006). Automatically disambiguated tags used during both training and parsing.
Zdeněk Žabokrtský	76.1	75.9	A rule-based parser, described in Holan & Žabokrtský (2006). Automatically disambiguated tags used.
Daniel Zeman	75.0	74.8	Same parser and settings as in Zeman (2004a), run by Zeman in 2006. Automatically disambiguated tags used during both training and parsing.
Václav Klimeš	74.8	74.6	Accuracy reported by Klimeš in 2006; to be published. Automatically disambiguated tags used during both training and parsing.
Tomáš Holan (r2l)	74.0	73.9	Pushdown automaton parser (Holan & Žabokrtský, 2006). Automatically disambiguated tags used.
Tomáš Holan (l2r)	71.4	71.3	Pushdown automaton parser (Holan & Žabokrtský, 2006). Automatically disambiguated tags used.
Tomáš Holan (analog)	71.5	71.1	A parser that “searches for the local tree configuration most similar to the training data” (Holan & Žabokrtský, 2006) (after all, which parser does not?) The parser itself shall be described in Holan (2005). Automatically disambiguated tags used.
Tomáš Holan (r23)	61.1	61.7	Pushdown automaton parser (Holan & Žabokrtský, 2006). 3-letter word endings used, instead of tags.
Tomáš Holan (l23)	54.9	53.3	Pushdown automaton parser (Holan & Žabokrtský, 2006). 3-letter word endings used, instead of tags.

A Winner?

In their EMNLP paper, Koo et al. (2010) report unlabeled accuracy of 87.32 % on PDT. Unfortunately they do not specify what version of PDT and what test dataset they used, not to mention the distinction between gold and automatically disambiguated morphology. So it is difficult to tell how this result compares to the others.

CoNLL Shared Task 2006

The CoNLL-X (2006) shared task involved dependency parsing of 13 languages including Czech. Training and test data were taken from PDT 1.0. However, the published results are not directly comparable to the results presented above because of the following reasons:

Both training and test data is smaller than in original PDT: 72,703 training sentences (1,249,408 tokens) and 365 test sentences (5853 tokens).
The source of morphology is unknown, it could be the manually annotated gold standard.
Different attachment metric (e.g. punctuation nodes do not count).
The official score is labeled accuracy, i.e. attachment plus dependency label.

For an overview of the results by the various teams, see Buchholz & Marsi (2006).

Authors	Labeled accuracy	Notes
Joakim Nivre	82.4	Run later on the CoNLL-X data, see Nivre (2009).
Ryan McDonald, Kevin Lerman, Fernando Pereira	80.2
Joakim Nivre, Johan Hall, Jens Nilsson, Gülşen Eryiğit, Svetoslav Marinov	78.4
John O'Neil	76.6
Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto	76.2
Kenji Sagae	75.2
Simon Corston-Oliver, Anthony Aue	74.5
Ming-Wei Chang, Quang Do, Dan Roth	72.9
Richard Johansson, Pierre Nugues	71.5
Xavier Carreras, Mihai Surdeanu, Lluís Màrquez	68.8
Sebastian Riedel, Ruket Çakıcı, Ivan Meza-Ruiz	67.4
Eckhard Bick	63.0
Sander Canisius, Toine Bogers, Antal van den Bosch, Jeroen Geertzen, Erik Tjong Kim Sang	60.9
Markus Dreyer, David A. Smith, Noah A. Smith	60.5
Giuseppe Attardi	59.8
Yu-Chieh Wu, Yue-Shi Lee, Jie-Chi Yang	59.4
Ting Liu, Jinshan Ma, Huijia Zhu, Sheng Li	58.5
Michael Schiehlen, Kristina Spranger	53.3
Deniz Yuret	51.9

CoNLL Shared Task 2007

The CoNLL 2007 shared task involved dependency parsing of 10 languages including Czech. Training and test data were taken from PDT 2.0.

The training-test data split should correspond to the “official” one published with PDT. However, only part of the data is used: 25,364 training sentences (432,296 tokens), 364 development sentences (5760 tokens) and 286 test sentences (4724 tokens).
The source of morphology is probably the manually annotated gold standard.
Unlike in 2006, accuracy of attaching punctuation nodes does count.
Both labeled and unlabeled accuracy have been published.

For an overview of the results by the various teams, see Nivre et al. (2007).

Authors	Labeled	Unlabeled
Tetsuji Nakagawa	80.19	86.28
Xavier Carreras	78.60	85.16
Jens Nilsson, Johan Hall, Joakim Nivre, Gülşen Eryiğit, Beáta Megyesi, Mattias Nilsson, Markus Saers	77.98	83.59
Ivan Titov, James Henderson	77.94	84.19
Giuseppe Attardi, Felice Dell'Orletta, Maria Simi, Atanas Chanev, Massimiliano Ciaramita	77.37	83.40
Johan Hall, Jens Nilsson, Joakim Nivre, Gülşen Eryiğit, Beáta Megyesi, Mattias Nilsson, Markus Saers	77.22	82.35
Xiangyu Duan, Jun Zhao, Bo Xu	75.34	80.82
Kenji Sagae, Jun'ichi Tsujii	74.83	81.27
Michael Schiehlen, Kristina Spranger	73.86	81.73
Wenliang Chen, Yujie Chang, Hitoshi Isahara	73.69	80.14
Le-Minh Nguyen, Akira Shimazu, Phuong-Thai Nguyen, Xuan-Hieu Phan	72.54	80.91
Keith Hall, Jiří Havelka, David A. Smith	72.27	78.47
Richard Johansson, Pierre Nugues	70.98	77.39
Prashanth Reddy Mannem	70.68	77.20
Maes	67.38	74.03
Yu-Chieh Wu, Jie-Chi Yang, Yue-Shi Lee	66.72	73.07
Sander Canisius, Erik Tjong Kim Sang	56.14	72.12
Jia	54.95	70.41
Svetoslav Marinov	53.47	59.57
Daniel Zeman	50.21	59.19

CoNLL Shared Task 2009

The CoNLL 2009 shared task focused on semantic role labeling but it also involved dependency parsing of 7 languages including Czech. Training and test data were taken from PDT 2.0.

The training-test data split should correspond to the “official” one published with PDT. However, only part of the data is used: 38,727 training sentences (652,544 tokens), 5228 development sentences (87,988 tokens) and 4213 test sentences (70,348 tokens).
Both manually and automatically disambiguated morphology was available in all three datasets.
Only labeled attachment scores were published for the syntactic part of the task.

For an overview of the results by the various teams, see Hajič et al. (2009) and also this site.

Authors	Labeled	Unlabeled
Andrea Gesmundo, James Henderson, Paola Merlo, Ivan Titov	80.38
Bernd Bohnet	80.11
Wanxiang Che, Zhenghua Li, Yongqiang Li, Yuhang Guo, Bing Qin, Ting Liu	80.01
Hai Zhao, Wenliang Chen, Jun'ichi Kazama, Kiyotaka Uchimoto, Kentaro Torisawa	79.70
Yotaro Watanabe, Masayuki Asahara, Yuji Matsumoto	78.17
Yi Zhang, Rui Wang, Stephan Oepen	75.58
Xavier Lluís, Stefan Bott, Lluís Màrquez	75.00
Brown	73.29
Buzhou Tang, Lu Li, Xinxin Li, Xuan Wang, Xiaolong Wang	72.60
Qifeng Dai, Enhong Chen, Liu Shi	58.69
Han Ren, Donhong Ji, Jing Wan, Mingyao Zhang	57.30
Daniel Zeman	57.06
Roser Morante, Vincent van Asch, Antal van den Bosch	49.41

References

The following list of publications gives the picture of parsing results achieved within the ÚFAL research projects, as well as some relevant references to publications of authors at other sites.

Ondřej Bojar (2004a): Czech Syntactic Analysis Constraint-Based, XDG: One Possible Start. In: Prague Bulletin of Mathematical Linguistics, 81, pp. 43-54. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Ondřej Bojar (2004b): Problems of Inducing Large Coverage Constraint-Based Dependency Grammar for Czech. In: H. Christiansen, P. R. Skadhauge, J. Villadsen (eds.): Proceedings of International Workshop on Constraint Solving and Language Processing, pp. 29-42. Roskilde Universitet, Roskilde, Denmark.
Available: PDF PS BibTeX
Sabine Buchholz, Erwin Marsi (2006): CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), pp. 149-164. Association for Computational Linguistics, New York City, New York, USA.
Eugene Charniak (2000): A Maximum-Entropy-Inspired Parser. In: Proceedings of NAACL. Association for Computational Linguistics, Seattle, Washington.
Available: gzipped PS from Eugene Charniak's homepage
Michael Collins, Jan Hajič, Eric Brill, Lance Ramshaw, Christoph Tillmann (1999): A Statistical Parser for Czech. In: Proceedings of the 37th Meeting of the ACL, pp. 505-512. University of Maryland, College Park, Maryland.
Available: PS from Michael Collins' homepage
Jan Hajič, Eric Brill, Michael Collins, Barbora Hladká, Douglas Jones, Cynthia Kuo, Lance Ramshaw, Oren Schwartz, Christoph Tillmann, Daniel Zeman (1998): Core Natural Language Processing Technology Applicable to Multiple Languages. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, Maryland.
Available: PDF PS BibTeX
Jan Hajič, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue, Yi Zhang: The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task, pp. 1–18, Association for Computational Linguistics, Boulder, Colorado
Available: PDF
Keith Hall, Václav Novák (2005): Corrective Modeling for Non-Projective Dependency Parsing. In: Proceedings of the International Workshop on Parsing Technologies (IWPT). Association for Computational Linguistics, Vancouver, British Columbia.
Available: PDF
Keith Hall, Václav Novák (2010): Corrective Dependency Parsing. In: Joakim Nivre (ed.): Trends in Parsing Technology. Springer-Verlag Berlin Heidelberg.
Tomáš Holan (2005): Genetické učení závislostních analyzátorů. In: P. Vojtáš (ed.): Proceedings of ITAT 2005 Univerzita Pavla Jozefa Šafárika, Košice, Slovakia.
Tomáš Holan, Vladislav Kuboň, Martin Plátek, Karel Oliva (2003): A Theoretical Basis of an Architecture of a Shell of a Reasonably Robust Syntactic Analyser. In: V. Matoušek, P. Mautner (eds.): Proceedings of the 7th International Conference on Text, Speech and Dialogue, pp. 58-65. Springer-Verlag, Berlin / Heidelberg / New York, České Budějovice, Czechia.
Available: BibTeX
Tomáš Holan, Zdeněk Žabokrtský (2006): Combining Czech Dependency Parsers. To Appear In: Proceedings of the 9th International Conference on Text, Speech and Dialogue. Springer-Verlag, Berlin / Heidelberg / New York, Brno, Czechia.
Available: PDF (preliminary version)
Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola, David Sontag (2010): Dual Decomposition for Parsing with Non-Projective Head Automata. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1288-1298. MIT, Massachusetts, USA.
Available: PDF BibTeX
Vladislav Kuboň (2001a): Problems of Robust Parsing of Czech. PhD thesis. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Vladislav Kuboň (2001b): A Method for Analyzing Clause Complexity. In: Prague Bulletin of Mathematical Linguistics, 75, pp. 5-28. Univerzita Karlova, Praha, Czechia.
Available: BibTeX
Vladislav Kuboň, Tomáš Holan, Karel Oliva, Martin Plátek (1998a): Two Useful Measures of Word Order Complexity. In: A. Polguere, S. Kahane (eds.): Proceedings of the COLING-ACL Workshop on Dependency-Based Grammars, pp. 21-28. Université de Montréal, Montréal, Quebec.
Available: BibTeX
Vladislav Kuboň, Tomáš Holan, Karel Oliva, Martin Plátek (1998b): Two Useful Measures of Word Order Complexity. In: ÚFAL Technical Report, 4. Univerzita Karlova, Praha, Czechia.
Available: BibTeX
Vladislav Kuboň, Tomáš Holan, Karel Oliva, Martin Plátek (2001): Word-Order Relaxations & Restrictions within a Dependency Grammar. In: Proceedings of International Workshop on Parsing Technologies, pp. 237-240. Qīnghuá Dàxué Chūbǎnshè, Běijīng, China.
Available: BibTeX
Vladislav Kuboň, Martin Plátek (2001): A Method of Accurate Robust Parsing for Czech. In: V. Matoušek, P. Mautner, R. Mouček, K. Taušer (eds.): Proceedings of the 5th International Conference on Text, Speech and Dialogue, pp. 69-92. Springer-Verlag, Berlin / Heidelberg / New York, Plzeň, Czechia.
Available: BibTeX
Markéta Lopatková, Martin Plátek, Vladislav Kuboň (2005): Závislostní redukční analýza přirozených jazyků. In: P. Vojtáš (ed.): Proceedings of ITAT 2004 Univerzita Pavla Jozefa Šafárika, Košice, Slovakia.
Available: BibTeX
Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajič (2005): Non-projective Dependency Parsing using Spanning Tree Algorithms. In: Proceedings of the Human Language Technology / Empirical Methods in Natural Language Processing conference (HLT-EMNLP) Association for Computational Linguistics, Vancouver, British Columbia.
Available: PDF from the ACL Anthology
Joakim Nivre (2009): Non-Projective Dependency Parsing in Expected Linear Time. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 351-359. Association for Computational Linguistics, Suntec, Singapore.
Available: PDF from the ACL Anthology
Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, Deniz Yuret (2007): The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915-932. Association for Computational Linguistics, Praha, Czechia.
Joakim Nivre, Jens Nilsson (2005): Pseudo-Projective Dependency Parsing. In: Proceedings of the 43rd Annual Meeting of the ACL. University of Michigan, Ann Arbor, Michigan.
Available: PDF BibTeX from the ACL Anthology
Václav Novák, Zdeněk Žabokrtský (2007): Feature Engineering in Maximum Spanning Tree Dependency Parser. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue. Západočeská univerzita, Plzeň, Czechia. Springer-Verlag Berlin Heidelberg, LNCS 4629.
Available: PDF
Kiril Ribarov (2000): Rule-Based Tagging: Morphological Tagsets versus Tagset of Analytical Functions. In: M. Gavrilidou, G. Karaiannis, S. Markantonatou, S. Piperidis, G. Stainhaouer (eds.): Proceedings of the 2nd International Conference on Language Resources (LREC), pp. 1123-1125. European Language Resources Association, Athîna, Greece.
Available: PDF PS BibTeX
Kiril Ribarov (2002): On the Rule-Based Parsing of Czech. In: Prague Bulletin of Mathematical Linguistics, 77, pp. 77-99. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Kiril Ribarov (2004): Automatic Building of a Dependency Tree - The Rule-Based Approach and Beyond. PhD thesis. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Anoop Sarkar, Daniel Zeman (2000): Automatic Extraction of Subcategorization Frames for Czech. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 691-697. Universität des Saarlandes, Saarbrücken, Germany.
Available: PDF PS BibTeX
Daniel Zeman (1998): A Statistical Approach to Parsing of Czech. In: Prague Bulletin of Mathematical Linguistics, 69. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Daniel Zeman (2001a): How Much Will a RE-based Preprocessor Help a Statistical Parser? In: Proceedings of International Workshop on Parsing Technologies, pp. 253-256. Qīnghuá Dàxué Chūbǎnshè, Běijīng, China.
Available: PDF PS BibTeX
Daniel Zeman (2001b): Parsing with Regular Expressions: A Minute to Learn, a Lifetime to Master. In: Prague Bulletin of Mathematical Linguistics, 75, pp. 29-37. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Daniel Zeman (2002a): Can Subcategorization Help a Statistical Dependency Parser? In: S.-C. Tseng (ed.): Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 1156-1162. Zhōngyāng Yánjiùyuàn, Táiběi, Taiwan.
Available: PDF PS BibTeX
Daniel Zeman (2002b): How to Decrease the Performance of a Statistical Parser. In: Prague Bulletin of Mathematical Linguistics, 78, pp. 53-62. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Daniel Zeman (2004a): Parsing with a Statistical Dependency Model. PhD thesis. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Daniel Zeman (2004b): Neprojektivity v Pražském závislostním korpusu (PDT). In: ÚFAL Technical Report, 22. Univerzita Karlova, Praha, Czechia.
Available: PDF PS BibTeX
Daniel Zeman, Anoop Sarkar (2000): Learning Verb Subcategorization from Corpora: Counting Frame Subsets. In: M. Gavrilidou, G. Karaiannis, S. Markantonatou, S. Piperidis, G. Stainhaouer (eds.): Proceedings of the 2nd International Conference on Language Resources (LREC), pp. 227-233. European Language Resources Association, Athîna, Greece.
Available: PDF PS BibTeX
Daniel Zeman, Zdeněk Žabokrtský (2005): Improving Parsing Accuracy by Combining Diverse Dependency Parsers. In: Proceedings of the International Workshop on Parsing Technologies (IWPT 2005). Association for Computational Linguistics, Vancouver, British Columbia.
Available: PDF

Maintained by Daniel Zeman

Updated on 11 July 2013

Czech Parsing

Automatická syntaktická analýza češtiny

Search form