The FS files serve for encoding sentence structures in natural language. Each such file contains a sequence of trees whose nodes correspond to words of the sentence. Each node (word) is described by a set of attributes.
The names and data types of particular attributes are not part of FS format. Rather, each FS file has a header which defines attributes for its tree nodes locally.
The nonterminal symbols are surrounded by "<
" ">
" characters, terminal symbols or strings of terminal symbols are enclosed in double quotes. A c-like notation is used inside of quotes, thus "\t"
means the character with the code 9, i.e. HTAB. The character "\n"
represents the end of line regardless the platform, i.e. it matches not only real "\n" in its C sense, but also "\r\n" (DOS-Windows EOL), or even "\r".
Any end of line escaped by a backslash ("\\\n"
) has a special meaning. It is generated only for the sake of human legibility of the file. When processing the file, such escaped end of line is discarded immediately and its surroundings is parsed as if it were not present. It can appear almost everywhere so in the syntax description it is not mentioned anywhere. It can even appear within an identifier but unlike the other backslash-escaped function characters it does not become a part of the identifier.
The unary postfix operators "*
", "+
" and "?
" mean that the operand appears n-times in a row, where n>=0
for *, n>0
for +, and n
is 0 or 1 for ?.
In contexts where a nonterminal can be interpreted as a set, the binary operator "-
" can be used. It denotes a difference of two sets.
The file contains a header with node attribute definitions, and a sequence of trees.
<fs-file> ::=
<definition-line>+ "\n"+ (<tree> "\n")+ <editor-configuration>?
<editor-configuration> ::=
"(" <number> ("," <number>)* ")"
Note: The numbers in the editor configuration are indexes of attributes that ought to be displayed by default. (The editor allows to turn on displaying the rest.) The attribute indices must be ordered ascending, otherwise the program crashes. It is thus impossible to enforce a different ordering of attributes when displaying the tree.
An identifier is one of the main elements of the FS file syntax. It is a string of arbitrary characters starting by the first character and ending before the first function character (it self is not a part of the identifier). Even function characters can be parts of identifiers when they are escaped by a backslash (the backslash used for escaping a special character is not a part of the identifier).
Note: The length of identifiers is limited, the limit depends on the usage. For an attribute name it is limited to 20 characters, for an attribute value it is limited to 120 characters.
<attribute-name> ::=
<identifier>
<attribute-value> ::=
<identifier>
<identifier> ::=
<identifier-character>+
<identifier-character> ::=
<normal-character> | <escaped-character>
<function-character> ::=
"\\" | "=" | "," | "[" | "]" | "|"
<normal-character> ::=
<any-character>-<function-character>-"\n"
<escaped-character> ::=
"\\" (<any-character>-"\n")
The beginning of each file contains a header with definitions of the attributes which can appear in tree nodes. Each header line begins with the @
character. Follows a capital letter denoting properties of the attribute, then a space and the attribute name. For example "@P lemma
".
Note: In the list of allowed values in the @L definition (<values>
), the values cannot be repeated.
<definition-line> ::=
("@" <property> <view>? " " <attribute-name> "\n") |
("@L" <view>? " " <attribute-name> "|" <values> "\n")
<property> ::=
"K" | "P" | "O" | "N" | "V" | "W" | "H"
<view> ::=
"1" | "2" | "3"
<values> ::=
<attribute-value> ("|" <values>)?
K
P
ord=7
, e.g.). Positional attributes don't. The name of a positional attribute is figured out after the relative position of its value with respect to the previous values (see details below in the paragraph "Node").O
L
H
N
@W
attribute is provided. If the @N
attribute is not present, the tree is centered regardless there is or is not a @W
attribute. Maximally one such attribute per FS file can be defined.W
@N
and @W
attributes are defined, the former specifies the ordering of nodes in tree view while the latter specifies the ordering of words in the linear view on status line. It enables that a non-projective tree is reordered by the user to a projective order but the sentence remains displayed in the original order on the status line.V
@VH
(default) or @VA
. The former is default (i.e. @V
is the same as @VH
) and means that the values of hidden nodes (see the attribute @H
) will not be displayed even on the status line. The latter means that even hidden nodes shall be shown on status line.More than one property can be defined for one attribute. The definition lines with all the properties need not follow each other in the file header. They must however fulfill the following constraints:
@V
attribute per file can be defined.@W
attribute per file can be defined.@N
attribute per file can be defined.@N
property cannot be combined with other properties. Nevertheless the @N
attribute has automatically the properties @P
and @O
as well.@V
and @L
.@L
must be the last property defined for an attribute but it cannot be the only property of that attribute.The view mode can be defined optionally. It can be required that the value of the attribute be always highlighted in the tree editor.
1
ATTR_SHADOW
2
ATTR_HILITE
3
ATTR_XHILITE
The trees are described in the usual parentheses notation, i.e. after the description of an inner node the parenthesized comma-separated list of its children (or their subtrees) follows. The children of each node must be ordered according to the values of their numeric attribute @N
, if any. Breaking this rule can cause the tree editor to display the tree incorrectly (the projectivity is involved; it is assumed that the numeric attribute contains the index of the word according to the sentence word order).
<tree> ::=
<node> ("(" <children> ")")?
<children> ::=
<tree> ("," <children>)?
Besides pure syntax it is also necessary to check the relations between the element <attributes>
and the definitions of the respective attributes in the header of the file. The constraints following from these relations are described below.
<node> ::=
<attribute-set> ("|" <node>)?
<attribute-set> ::=
"[" <attributes>? "]"
<attributes> ::=
<attribute> ("," <attributes>)?
<attribute> ::=
(<attribute-name> "=")? <values>
<values> ::=
<attribute-value> ("|" <values>)
The element <attributes>
must fulfill the following constraints (based on the particular definition of attributes in the file header):
<attribute-name>
element must equal to a name of an attribute defined in the header.<attribute>
element with the same <attribute-name>
appears twice or if the attribute name is not mentioned but the last read attribute's definition immediately precedes the definition of an attribute whose value has already been read.@L
attribute must be one of the predefined values from the definition of the attribute.