In the following description, "immediately following ... element" means a following <f> and/or <d> element without an intervening SGML tag except (possibly) for a <D> and/or <i> element(s).
Original string (text token) is a contracted form, such as isn't in English. In Czech, it only appears for the following forms and/or tags:
Original form of a misspelling, including erroneously split or joined forms. It can only be used for truly corrected forms (manually or automatically; currently, the texts of PDT are only manually corrected).
The immediately following <f> and/or <d> element(s) always contain the correct spelling(s) in its #PCDATA and the (sub)string gen in the case (or type) attribute(s).
Original form which is for some technical reason superfluous, but which could not be removed by the tokenizer without too much specific processing. Neither <f> nor <d> element(s) are present in the following text.
This <w> element has always empty text (#PCDATA), since it signifies a token which is for some technical reason missing in the original text. It is used solely for the purpose of easy and consistent identification of the "artificially" generated following <f> and/or <d> element(s).
Original form of a single fixed phrase (in the linguistic sense). There is always more than one element <w phrpart> immediately preceding a single <f> element, which then always has the gen.phrase (sub)string in its case attribute and contains the complete phrase in its #PCDATA (usually, spaces in the original text are replaced by the "equal sign" characters).
Original form of an automatically "normalized" number. Two phenomena can be normalized:
In other words, numbers are always in their mathematical notation at the <f num.gen> element (unless spelled out as numerals). As usual, the normalized number always immediately follows this <w num.orig> element.