EnglishČesky
Header Image n.1Header Image n.2Header Image n.3Header Image n.4Header Image n.5Header Image n.6Header Image n.7

Introduction

The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see Documentation). Moreover, new information was added to the data:

Please note that since the release of PDT 2.5 in 2012, new versions of the corpus have been published: PDT 3.0 (2013), PDiT 1.0 (2012)

Requirements

There are no special software requirements for working with PDT 2.5. All that is needed should already be in your computer. You need a web browser and PDF reader to open the documentation and publications.

Data

Data in the PML format are stored in the directory data and divided into three groups according to the topmost layer of annotation – subdirectories mw, amw and tamw. Each group is further divided into 10 directories – etest, dtest and train-N, where N stands for numbers 1 through 8. (The structure is the same as of PDT 2.0.) All files are gzipped but there is no need to uncompress them because TrEd has no problems opening gzipped files. It's even faster than opening uncompressed plain text files.

Troubles?

If you feel that you need more help or if something does not work, do not hesitate to contact us.