Introduction
The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see Documentation). Moreover, new information was added to the data:
Please note that since the release of PDT 2.5 in 2012, new versions of the corpus have been published: PDT 3.0 (2013), PDiT 1.0 (2012)
Requirements
There are no special software requirements for working with PDT 2.5. All that is needed should already be in your computer. You need a web browser and PDF reader to open the documentation and publications.
Data
Data in the PML format are stored in the directory data
and divided into three groups according to the topmost layer of annotation – subdirectories mw
, amw
and tamw
. Each group is further divided into 10 directories – etest
, dtest
and train-N
, where N
stands for numbers 1 through 8. (The structure is the same as of PDT 2.0.) All files are gzipped but there is no need to uncompress them because TrEd has no problems opening gzipped files. It's even faster than opening uncompressed plain text files.
Troubles?
If you feel that you need more help or if something does not work, do not hesitate to contact us.