The current way of representing coreference makes use of the fact that every node of every tree has an identifier (the value of the id
attribute), which is unique within PDT. If coreference is a link between two nodes (one node referring to another), it is enough to specify the indentifier of the coreferred node in the appropriate attribute of the coreferring node. Individual coreference subtypes are distinguished by the value of another attribute.
Three attributes have been introduced for representing coreference:
coref_gram.rf
The coref_gram.rf
attribute is used for representing grammatical coreference. See Table 9.1, "Values of the coref_gram.rf
attribute".
Table 9.1. Values of the coref_gram.rf
attribute
a list every element of which is a PML reference |
identifiers of the coreferred nodes, which are usually in the same tree |
Grammatical coreference can always be represented as a link between two nodes (one referring to the other).
coref_text.rf
The coref_text.rf
attribute is used for representing textual coreference if the coreferred node is explicitly specified (see Section 3.1.1, "Explicitly coreferred element"). See Table 9.2, "Values of the coref_text.rf
attribute".
coref_special
The coref_special
attribute is used for representing special types of textual coreference: the coreferred node is not a particular node or subtree. These are cases of exophoric coreference (see Section 3.1.3, "Exophora") and reference to a segment (see Section 3.1.2, "Reference to a segment"). The possible values are in Table 9.3, "Values of the coref_special
attribute".
Every coreferring node is assigned a value only in one of these attributes.
Depending on which part of the tree it is referred to, there are the following cases of coreference :
reference to a leaf.
The coref_gram.rf
or coref_text.rf
attribute contains the identifier of the target leaf (coreferred node).
Cf.:
Vlasta šla do divadla, kde na ni čekal Marek. (=Vlasta went to the theater where Marek already waited for her)
The node referred to by ona is the leaf (node) representing Vlasta.
reference to the root of a subtree.
The coref_gram.rf
or coref_text.rf
attribute contains the identifier of the target (coreferred) subtree.
If the coreferred node is not a leaf, we assume that it is referred to the whole subtree. Cf.:
Můj o dva roky mladší bratr, kterého ještě neznáš, přijde zítra (=The two years younger brother of mine which you don't know yet comes tomorrow)
The node referred to by který is the whole subtree můj o dva roky mladší bratr, not just the node for bratr.
!!! One cannot exclude the possibility that there are cases such that it is referred just to the node representing the root of a subtree and not to its daughters. This possibility has not been taken into account so far.
A special case of reference to the root of a subtree is reference to the whole sentence. In such cases, the coreferred node is not the root of the sentence but rather the technical root node of the tree.
reference to more than one node.
The coref_gram.rf
or coref_text.rf
attribute contains more identifiers.
It is possible to refer to more than one expression (subtree). In such cases, it is referred to all individual expressions (i.e. the relevant attribute contains the identifiers of all target nodes). There are more than one coreference relations present. Cf.:
Marie vzala Vlastu do divadla, kde na ně čekal Marek. (=Marie took Vlasta to the theater where Marek already waited for them)
The nodes referred to by ony (=they) are two nodes: the one for Marie and the one for Vlasta and it is necessary to refer to each of them individually.
This is only a temporary solution; see also Section 5.2.2, "Referring with the type "tatínek s maminkou"".
reference to a segment.
The coref_special
attribute is assigned the value segm
.
It is referred to a larger segment (which is not further specified). For more details see Section 3.1.2, "Reference to a segment".
extra-textual reference.
The coref_special
attribute is assigned the value exoph
.
It is referred to a reality external to the text. For more details see Section 3.1.3, "Exophora".
Coreference relations can also be established between nodes that are not present at the surface level, i.e. between newly established nodes with various t-lemma substitutes (see also Section 4, "Survey of types of coreference with respect to the t-lemmas of the coreferring nodes"). Coreference relations often form long coreference chains at the end of which there are expressions that do not refer to any other node (see Section 5.1, "Preserving the coreference chains").