The "main" word (token) element. Contains the word form from text, and then elements associated with the word form, such as lemma and tag (manual, dictionary possibilities, machine generated by various taggers), or governing node and analytical function (again, manual and/or automatic) on the analytical level, and governing node, functor and grammateme(s) on the tectogrammatical annotation level (yet again, possibilities exist to encode both manual and automatically assigned values; see also the description of the <fadd> element).
The attribute case contains an indication of the token's capitalization pattern, even though the actual capitalization from the original text is preserved, too. Only five types of capitalization are recognized and marked:
The <f> element is in most cases identical to the appearance of the word form in the original text. In case of any discrepancy (such as an obvious spelling error, multiword or split phrases detected at tokenization time), the <w> element(s) is(are) used, preceding the <f> element(s); in such cases, the attribute case containing the substring gen is present in the <f> tag. Obviously, some of those discrepancies could have been discovered only in the manually annotated data; therefore, it is not guaranteed that e.g. spelling errors are marked in all data.
ATTRIBUTES
CONTENT DECLARATION