PDT 2.0 - Guide
Next
PDT 2.0 - Guide
Jan
Hajič
Eva
Hajičová
Jaroslava
Hlaváčová
Václav
Klimeš
Jiří
Mírovský
Petr
Pajas
Jan
Štěpánek
Barbora
Vidová Hladká
Zdeněk
Žabokrtský
Table of Contents
1. Introduction
1.1. What is PDT 2.0
1.2. Historical background of the project
1.3. Development of the project
1.4. About Czech
1.5. Directory structure
2. Layers of annotation
2.1. Morphological layer
2.1.1. Logical structure
2.1.2. Physical realization
2.1.3. Annotation process
2.2. Analytical layer
2.2.1. Logical structure
2.2.2. Physical realization
2.2.3. Annotation process
2.3. Tectogrammatical layer
2.3.1. Logical structure
2.3.2. Physical realization
2.3.3. Annotation process
2.4. Sample preview of annotation on the three layers
3. Data
3.1. Sources of text
3.2. Division of the data according to the layer of annotation
3.3. Division of the data into training and test sets
3.4. Data formats
3.4.1. PML
3.4.2. Perl Storable Format
3.4.3. FS
3.4.4. CSTS
3.5. Conventions of file naming
3.6. Full data
3.7. Sample data
3.8. PDT-VALLEX
3.9. PDT 1.0 update
4. Tools
4.1. Searching trees:
Netgraph
4.2. Viewing (browsing) trees:
TrEd
4.3. Automatic tree processing:
btred/ntred
4.4. Converting data between formats
4.4.1. Conversion between the PDT formats
4.4.2. Conversion from formats of other treebanks
4.5. Parsing Czech: from plain text to PDT-formatted dependency trees
4.6. Creating data for parser development
4.7. Macros for error detection
5. Documentation
6. Publications
6.1. Theoretical background of PDT
6.2. PDT 2.0
6.2.1. General information
6.2.2. Morphological layer
6.2.3. Analytical layer
6.2.4. Tectogrammatical layer
6.3. Tools
6.3.1. Netgraph
6.3.2. Morphological analysis and tagging
6.3.3. Parsing
6.3.4. Automatic functor assignment
7. Distribution and license
7.1. License agreement
8. Installation
9. Credits
10. Acknowledgments
List of Figures
2.1.
Linking the layers
2.2.
Data and annotation workflow diagram
2.3.
The analytical tree of the example sentence
2.4.
The tectogrammatical tree of the example sentence (a detailed view)
3.1.
Number of tokens from the particular sources
3.2.
Division of the data to layers
3.3.
Division of the data into training and test sets
3.4.
PDT-VALLEX sample entry in the presentation format
3.5.
PDT-VALLEX in the
TrEd
editor
4.1.
Creating a query in
Netgraph
4.2.
A result tree in
Netgraph
4.3.
Tectogrammatical tree in
TrEd
List of Tables
2.1.
An example sentence
2.2.
Morphological analysis of the example sentence
3.1.
Data annotated on all three layers (
tamw
).
3.2.
Data annotated only on m-layer and a-layer (
amw
).
3.3.
Data annotated only on m-layer (
mw
).
3.4.
Alternative grouping: All data annotated on m-layer (union of
tamw
,
amw
, and
mw
).
3.5.
Alternative grouping: All data annotated on a-layer (union of
tamw
and
amw
).