MorphoDiTa API Reference
- MorphoDiTa Versioning
- Lemma Structure
- Struct string_piece
- Struct tagged_form
- Struct tagged_lemma
- Struct tagged_lemma_forms
- Struct token_range
- Struct derivated_lemma
- Class version
- Class tokenizer
- Class derivator
- Class derivation_formatter
- 12.1. derivation_formatter::format_derivation
- 12.2. derivation_formatter::format_tagged_lemma
- 12.3. derivation_formatter::format_tagged_lemmas
- 12.4. derivation_formatter::new_none_derivation_formatter
- 12.5. derivation_formatter::new_root_derivation_formatter
- 12.6. derivation_formatter::new_path_derivation_formatter
- 12.7. derivation_formatter::new_tree_derivation_formatter
- 12.8. derivation_formatter::new_derivation_formatter
- Class morpho
- Class tagger
- Class tagset_converter
- 15.1. tagset_converter::convert()
- 15.2. tagset_converter::convert_analyzed()
- 15.3. tagset_converter::convert_generated()
- 15.4. tagset_converter::new_identity_converter()
- 15.5. tagset_converter::new_pdt_to_conll2009_converter()
- 15.6. tagset_converter::new_strip_lemma_comment_converter()
- 15.7. tagset_converter::new_strip_lemma_id_converter()
- C++ Bindings API
- C# Bindings
- Java Bindings
- Perl Bindings
- Python Bindings
The MorphoDiTa API is defined in header morphodita.h
and resides in
ufal::morphodita
namespace.
The strings used in the MorphoDiTa API are always UTF-8 encoded (except from file paths, whose encoding is system dependent).
1. MorphoDiTa Versioning
MorphoDiTa is versioned using Semantic Versioning. Therefore, a version consists of three numbers major.minor.patch, optionally followed by a hyphen and pre-release version info, with the following semantics:
- Stable versions have no pre-release version info, development have non-empty pre-release version info.
- Two versions with the same major.minor have the same API with the same behaviour, apart from bugs. Therefore, if only patch is increased, the new version is only a bug-fix release.
- If two versions v and u have the same major, but minor(v) is greater than minor(u), version v contains only additions to the API. In other words, the API of u is all present in v with the same behaviour (once again apart from bugs). It is therefore safe to upgrade to a newer MorphoDiTa version with the same major.
- If two versions differ in major, their API may differ in any way.
Models created by MorphoDiTa have the same behaviour in all MorphoDiTa versions with same major, apart from obvious bugfixes. On the other hand, models created from the same data by different major.minor MorphoDiTa versions may have different behaviour.
2. Lemma Structure
The lemmas used by MorphoDiTa consist of three parts:
- raw lemma: text form of the lemma. May not uniquely distinguish lemma meanings, lemma use cases etc.
- lemma id: together with raw lemma provide a unique identifier of the lemma, possibly including lemma meanings or use cases.
- lemma comments: additional comments for the given lemma.
These parts are stored in one string and the boundaries between them can be
determined by
morpho::raw_lemma_len
and
morpho::lemma_id_len
methods.
Analyzer and tagger always return lemma in this structured form. When
performing morphological generation, either raw lemma or both raw lemma and
lemma id can be specified, any lemma comments are ignored.
3. Struct string_piece
struct string_piece { const char* str; size_t len; string_piece(); string_piece(const char* str); string_piece(const char* str, size_t len); string_piece(const std::string& str); }
The string_piece
is used for efficient string passing. The string
referenced in string_piece
is not owned by it, so users have to make sure
the referenced string exists as long as the string_piece
.
4. Struct tagged_form
struct tagged_form { std::string form; std::string tag; };
The tagged_form
is a pair of strings used when obtaining a form and tag
pair.
5. Struct tagged_lemma
struct tagged_lemma { std::string lemma; std::string tag; };
The tagged_lemma
is a pair of strings used when obtaining a lemma and tag
pair.
6. Struct tagged_lemma_forms
struct tagged_lemma_forms { std::string lemma; std::vector<tagged_form> forms; };
The tagged_lemma_forms
represents a lemma and a list of tagged forms.
7. Struct token_range
struct token_range { size_t start; size_t length; };
The token_range
represent a range of a token as returned by a tokenizer.
The start
and length
fields specify the token position in Unicode
characters, not in bytes of UTF-8 encoding.
8. Struct derivated_lemma
struct derivated_lemma { std::string lemma; };
The derivated_lemma
structure stores information about a derivation.
This information currently consists of lemma only, but a type of the derivation may
be added later.
9. Class version
class version { public: unsigned major; unsigned minor; unsigned patch; std::string prerelease; static version current(); };
The version
class represents MorphoDiTa version.
See MorphoDiTa Versioning for more information.
9.1. version::current
static version current();
Returns current MorphoDiTa version.
10. Class tokenizer
class tokenizer { public: virtual ~tokenizer() {} virtual void set_text(string_piece text, bool make_copy = false) = 0; virtual bool next_sentence(std::vector<string_piece>* forms, std::vector<token_range>* tokens) = 0; static tokenizer* new_vertical_tokenizer(); static tokenizer* new_czech_tokenizer(); static tokenizer* new_english_tokenizer(); static tokenizer* new_generic_tokenizer(); };
The tokenizer
class performs segmentation and tokenization of given text.
The class is not threadsafe.
The tokenizer
instances can be obtained either directly using
static methods or through instances of morpho
and tagger
.
10.1. tokenizer::set_text
virtual void set_text(string_piece text, bool make_copy = false) = 0;
Set the text which is to be tokenized.
If make_copy
is false
, only a reference to the given text is
stored and the user has to make sure it exists until the tokenizer
is released or set_text
is called again. If make_copy
is true
, a copy of the given text is made and retained until the
tokenizer is released or set_text
is called again.
10.2. tokenizer::next_sentence
virtual bool next_sentence(std::vector<string_piece>* forms, std::vector<token_range>* tokens) = 0;
Locate and return next sentence of the given text. Returns true
when successful and false
when
there are no more sentences in the given text. The arguments are filled with found tokens if not NULL
.
The forms
contain token ranges in bytes of UTF-8 encoding, the tokens
contain token ranges
in Unicode characters.
10.3. tokenizer::new_vertical_tokenizer
static tokenizer new_vertical_tokenizer();
Returns a new instance of a vertical tokenizer, which considers every line to be one token, with empty line denoting end of sentence. The user should delete the instance after use.
10.4. tokenizer::new_czech_tokenizer
static tokenizer new_czech_tokenizer();
Returns a new instance of a Czech tokenizer. The user should delete it after use.
If two MorphoDiTa versions have the same major.minor, this tokenizer should behave identically (apart from obvious bugfixes). Nevertheless, the behaviour of this tokenizer might change in different major.minor version. If you need a tokenizer whose behaviour does not change, use tokenizer embedded in a morphological dictionary.
10.5. tokenizer::new_english_tokenizer
static tokenizer new_english_tokenizer();
Returns a new instance of a English tokenizer. The user should delete it after use.
If two MorphoDiTa versions have the same major.minor, this tokenizer should behave identically (apart from obvious bugfixes). Nevertheless, the behaviour of this tokenizer might change in different major.minor version. If you need a tokenizer whose behaviour does not change, use tokenizer embedded in a morphological dictionary.
10.6. tokenizer::new_generic_tokenizer
static tokenizer new_generic_tokenizer();
Returns a new instance of a generic tokenizer. The user should delete it after use.
If two MorphoDiTa versions have the same major.minor, this tokenizer should behave identically (apart from obvious bugfixes). Nevertheless, the behaviour of this tokenizer might change in different major.minor version. If you need a tokenizer whose behaviour does not change, use tokenizer embedded in a morphological dictionary.
11. Class derivator
class derivator { public: virtual ~derivator(); virtual bool parent(string_piece lemma, derivated_lemma& parent) const = 0; virtual bool children(string_piece lemma, std::vector<derivated_lemma>& children) const = 0; };
The derivator
class perform morphological derivation on given lemmas.
The derivation are computed using lemma ids, see Lemma Structure.
The derivator
instances can be obtained through instances of
morpho
(and transitively through tagger
).
11.1. derivator::parent
virtual bool parent(string_piece lemma, derivated_lemma& parent) const = 0;
Return the parent of a given lemma in the morphological derivation tree. The lemma is assumed to be lemma id (see Lemma Structure), so if it contains any lemma comments, they are ignored.
The returned lemma is a full lemma (lemma id plus appropriate lemma comments).
If no parent exists, the function empties the parent lemma and returns false
.
11.2. derivator::children
virtual bool children(string_piece lemma, std::vector<derivated_lemma>& children) const = 0;
Return children of a given lemma in the morphological derivation tree. The lemma is assumed to be lemma id (see Lemma Structure), so if it contains any lemma comments, they are ignored.
The returned lemmas are full lemmas (lemma ids plus appropriate lemma comments).
If no children exist, the function empties the children vector and returns false
.
12. Class derivation_formatter
class derivation_formatter { public: virtual ~derivation_formatter() {} virtual void format_derivation(std::string& lemma) const; virtual void format_tagged_lemma(tagged_lemma& tagged_lemma, const tagset_converter& converter = nullptr) const = 0; virtual void format_tagged_lemmas(std::vector<tagged_lemma>& tagged_lemmas, const tagset_converter& converter = nullptr) const = 0; static derivation_formatter* new_none_derivation_formatter(); static derivation_formatter* new_root_derivation_formatter(const derivator* derinet); static derivation_formatter* new_path_derivation_formatter(const derivator* derinet); static derivation_formatter* new_tree_derivation_formatter(const derivator* derinet); static derivation_formatter* new_derivation_formatter(string_piece name, const derivator* derinet); };
The derivation_formatter
class performs required
morphological derivation and formats the results using a single string field
(i.e., directly in the lemma).
12.1. derivation_formatter::format_derivation
virtual void format_derivation(std::string& lemma) const;
Perform the required morphological derivation and format the result back directly in the lemma.
12.2. derivation_formatter::format_tagged_lemma
virtual void format_tagged_lemma(tagged_lemma& tagged_lemma, const tagset_converter& converter = nullptr) const = 0;
Perform the required derivation and store it directly in the tagged_lemma
. If a tagset_converter
is given, it is also applied.
12.3. derivation_formatter::format_tagged_lemmas
virtual void format_tagged_lemmas(std::vector<tagged_lemma>& tagged_lemmas, const tagset_converter& converter = nullptr) const;
Perform the required derivation on a list of tagged_lemma
s. If a tagset_converter
is given, it is also applied. Either way, only unique entries are returned.
12.4. derivation_formatter::new_none_derivation_formatter
static derivation_formatter* new_none_derivation_formatter();
Return a new derivation_formatter
instance which does nothing
(i.e., it performs no derivation).
12.5. derivation_formatter::new_root_derivation_formatter
static derivation_formatter* new_root_derivation_formatter(const derivator* derinet);
Return a new derivation_formatter
instance which replaces
a lemma by the corresponding root in the derivation tree.
12.6. derivation_formatter::new_path_derivation_formatter
static derivation_formatter* new_path_derivation_formatter(const derivator* derinet);
Return a new derivation_formatter
instance which replaces
a lemma by a space separated path to the root in the morphological derivation tree (the original
lemma is first, followed by its parent, with the root being the last one).
12.7. derivation_formatter::new_tree_derivation_formatter
static derivation_formatter* new_tree_derivation_formatter(const derivator* derinet);
Return a new derivation_formatter
instance which appends
to the lemma the whole morphological derivation tree which contains it.
The tree is encoded in the following way: root node is the first, then the subtrees of the root children are encoded recursively (each after one space), followed by a final space (which denotes that the children are complete).
12.8. derivation_formatter::new_derivation_formatter
static derivation_formatter* new_derivation_formatter(string_piece name, const derivator* derinet);
Return one of the available derivation_formatter
instances
according to the name
parameter:
none
: return new_none_derivation_formatter instanceroot
: return new_root_derivation_formatter instancepath
: return new_path_derivation_formatter instancetree
: return new_tree_derivation_formatter instance
13. Class morpho
class morpho { public: virtual ~morpho() {} static morpho* load(const char* fname); static morpho* load(istream& is); enum guesser_mode { NO_GUESSER = 0, GUESSER = 1, GUESSER_UNSPECIFIED = -1 }; virtual int analyze(string_piece form, guesser_mode guesser, std::vector<tagged_lemma>& lemmas) const = 0; virtual int generate(string_piece lemma, const char* tag_wildcard, guesser_mode guesser, std::vector<tagged_lemma_forms>& forms) const = 0; virtual int raw_lemma_len(string_piece lemma) const = 0; virtual int lemma_id_len(string_piece lemma) const = 0; virtual int raw_form_len(string_piece form) const = 0; virtual tokenizer* new_tokenizer() const = 0; virtual const derivator* get_derivator() const; };
A morpho
instance represents a morphological dictionary. Such a dictionary allow
morphological analysis, morphological generation provide information about lemma structure
and provides a suitable tokenizer. All methods are thread-safe.
13.1. morpho::load(const char*)
static morpho* load(const char* fname);
Factory method constructor. Accepts C string with a file name of the model.
Returns a pointer to an instance of morpho
which the user should delete
after use.
13.2. morpho::load(istream&)
static morpho* load(istream& is);
Factory method constructor. Accepts an input stream with the
model. Returns a pointer to an instance of morpho
which the user should
delete after use.
13.3. morpho::guesser_mode
enum guesser_mode { NO_GUESSER = 0, GUESSER = 1, GUESSER_UNSPECIFIED = -1 };
Guesser mode defines behavior in case of unknown words. When set to
GUESSER
, morpho tries to guess unknown words. When set to NO_GUESSER
,
morpho does not guess unknown words.
The GUESSER_UNSPECIFIED
mode denotes a default behaviour, which:
13.4. morpho::analyze()
virtual int analyze(string_piece form, guesser_mode guesser, std::vector<tagged_lemma>& lemmas) const = 0;
Perform morphological analysis of a form. The guesser parameter specifies whether a guesser can be used if the form is not found in the dictionary. Output is assigned to the lemmas vector.
If the form is found in the dictionary, analyses are assigned to lemmas
and NO_GUESSER
returned. If guesser == GUESSER
and the form analyses are
found using a guesser, they are assigned to lemmas and GUESSER
is
returned. Otherwise -1
is returned and lemmas are filled with one
analysis containing given form as lemma and a tag for unknown word.
13.5. morpho::generate()
virtual int generate(string_piece lemmma, const char* tag_wildcard, guesser_mode guesser, std::vector<tagged_lemma_forms>& forms) const = 0;
Perform morphological generation of a lemma. Optionally a tag_wildcard can be
specified (or be NULL
) and if so, results are filtered using this wildcard.
The guesser parameter speficies whether a guesser can be used if the lemma is
not found in the dictionary. Output is assigned to the forms vector.
Tag_wildcard can be either NULL
or a wildcard applied to the results.
A ?
in the wildcard matches any character, [bytes]
matches any of the
bytes and [^bytes]
matches any byte different from the specified ones.
A -
has no special meaning inside the bytes and if ]
is first in bytes,
it does not end the bytes group.
If the given lemma is only a raw lemma, all lemma ids with this raw lemma are
returned. Otherwise only matching lemma ids are returned, ignoring any lemma
comments. For every found lemma, matching forms are filtered using the
tag_wildcard. If at least one lemma is found in the dictionary, NO_GUESSER
is returned. If guesser == GUESSER
and the lemma is found by the guesser,
GUESSER
is returned. Otherwise, forms are cleared and -1
is returned.
13.6. morpho::raw_lemma_len
virtual int raw_lemma_len(string_piece lemma) const = 0;
When given a lemma returned by the dictionary, returns the length of a raw lemma (see Lemma Structure).
13.7. morpho::lemma_id_len
virtual int lemma_id_len(string_piece lemma) const = 0;
When given a lemma returned by the dictionary, returns the length of a raw lemma plus a lemma id (see Lemma Structure). Therefore, the substring of the original lemma of this length is a unique lemma identifier. The rest of the original lemma are lemma comments which do not identify the lemma.
13.8. morpho::raw_form_len
virtual int raw_form_len(string_piece form) const = 0;
When given a form, returns the length of a raw form. This is used only in external morphology model, where form contains also morphological analyses, and this call can return the length of the form without the analyses.
13.9. morpho::new_tokenizer
virtual tokenizer* new_tokenizer() const = 0;
Returns a new instance of a suitable tokenizer or NULL
if no such tokenizer
exists. The user should delete it after use.
Note that the tokenizer might use the morpho
instance, so the tokenizer
must not be used after the morpho
instance is destructed.
13.10. morpho::get_derivator
virtual const derivator* get_derivator() const;
Returns a derivator
for the morphology, or NULL
if not available.
The derivator
is owned by the morphology, so the returned
instance should not be freed and it cannot be used after the morpho
instance
is destructed.
14. Class tagger
class tagger { public: virtual ~tagger() {} static tagger* load(const char* fname); static tagger* load(istream& is); virtual const morpho* get_morpho() const = 0; virtual void tag(const std::vector<string_piece>& forms, std::vector<tagged_lemma>& tags, morpho::guesser_mode guesser = -1) const = 0; virtual void tag_analyzed(const std::vector<string_piece>& forms, std::vector<std::vector<tagged_lemma> >& analyses, std::vector<int>& tags) const = 0; tokenizer* new_tokenizer() const = 0; };
A tagger
instance represents a tagger, which perform disambiguation of
morphological analyses. All methods are thread-safe.
14.1. tagger::load(const char*)
static tagger* load(const char* fname);
Factory method constructor. Accepts C string with a file name of the model.
Returns a pointer to an instance of tagger
which the user should delete
after use.
14.2. tagger::load(istream&)
static tagger* load(istream& is);
Factory method constructor. Accepts an input stream with the
model. Returns a pointer to an instance of tagger
which the user should
delete after use.
14.3. tagger::get_morpho()
virtual const morpho* get_morpho() const = 0;
Returns a pointer to an instance of morpho
associated with the tagger. Do
not delete the pointer, it is owned by the tagger instance and deleted in the
tagger destructor.
14.4. tagger::tag()
virtual void tag(const std::vector<string_piece>& forms, std::vector<tagged_lemma>& tags, morpho::guesser_mode guesser = -1) const = 0;
Perform morphological analysis and subsequent disambiguation. Accepts
a std::vector
of string_piece
and fills the output vector of tagged_lemma
.
The `guesser` parameter defines whether morphological guesser should be used. If negative value is specified (which is the default), the guesser settings employed when the tagger model was trained is used.
14.5. tagger::tag_analyzed()
virtual void tag_analyzed(const std::vector<string_piece>& forms, std::vector<std::vector<tagged_lemma> >& analyses, std::vector<int>& tags) const = 0;
Perform morphological disambiguation using given morphological analyses.
The indices of chosen analyses are stored in the output vector tags
.
None of the analyses
can be empty – in that case, no operation is performed
and tags
is empty. On the other hand, the analyses
vector can be larger
than forms
– additional entries are ignored in that case.
Note that the tagger was trained with a specific morphology – the more your
morphological analyses differ from the original ones, the worse the results
will be. One of the usages of tag_analyzed
is to
consider only a subset of morphological analyses.
14.6. tagger::new_tokenizer
virtual tokenizer* new_tokenizer() const = 0;
Returns a new instance of a suitable tokenizer or NULL
if no such tokenizer
exists. The user should delete it after use. The call is equal to
get_morpho()->new_tokenizer()
.
15. Class tagset_converter
class tagset_converter { public: virtual ~tagset_converter() {} virtual void convert(tagged_lemma& tagged_lemma) const = 0; virtual void convert_analyzed(std::vector<tagged_lemma>& tagged_lemmas) const = 0; virtual void convert_generated(std::vector<tagged_lemma_forms>& forms) const = 0; static tagset_converter* new_identity_converter(); static tagset_converter* new_pdt_to_conll2009_converter(); static tagset_converter* new_strip_lemma_comment_converter(const morpho& dictionary); static tagset_converter* new_strip_lemma_id_converter(const morpho& dictionary); };
15.1. tagset_converter::convert()
virtual void convert(tagged_lemma& tagged_lemma) const = 0;
Convert the given tagged lemma.
15.2. tagset_converter::convert_analyzed()
virtual void convert_analyzed(std::vector<tagged_lemma>& tagged_lemmas) const = 0;
Convert the given results of morpho::analyze. Apart from calling convert, any repeated entries are removed.
15.3. tagset_converter::convert_generated()
virtual void convert_generated(std::vector<tagged_lemma_forms>& forms) const = 0;
Convert the given results of morpho::generate. Apart from calling convert, any repeated entries are removed.
15.4. tagset_converter::new_identity_converter()
static tagset_converter* new_identity_converter();
Returns a new instance of an identity converter. All convert methods of an identity converter do nothing. The user should delete the instance after use.
15.5. tagset_converter::new_pdt_to_conll2009_converter()
static tagset_converter* new_pdt_to_conll2009_converter();
Returns a new instance of a Czech PDT tag set to CoNLL2009 tag set converter. The user should delete the instance after use.
CoNLL2009 tag set uses two columns for tags – one is a POS and the other one
are additional FEATs. Because we have only one tag field, we merge these fields
together by using Pos=?|FEAT
, i.e., the POS is stored as a first FEAT.
15.6. tagset_converter::new_strip_lemma_comment_converter()
static tagset_converter* new_strip_lemma_comment_converter(const morpho& dictionary);
Returns a new instance of a tag set converter stripping
lemma comment using the given morpho
instance,
which must remain valid during existence of the tag set converter. The user
should delete the tag set converter instance after use.
15.7. tagset_converter::new_strip_lemma_id_converter()
static tagset_converter* new_strip_lemma_id_converter(const morpho& dictionary);
Returns a new instance of a tag set converter stripping
lemma id using the given morpho
instance,
which must remain valid during existence of the tag set converter. The user
should delete the tag set converter instance after use.
16. C++ Bindings API
Bindings for other languages than C++ are created using SWIG from the C++
bindings API, which is a slightly modified version of the native C++ API.
Main changes are replacement of string_piece
type by native
strings and removal of methods using istream
. Here is the C++ bindings API
declaration:
16.1. Helper Structures
typedef vector<int> Indices; typedef vector<string> Forms; struct TaggedForm { string form; string tag; }; typedef vector<TaggedForm> TaggedForms; struct TaggedLemma { string lemma; string tag; }; typedef vector<TaggedLemma> TaggedLemmas; typedef vector<TaggedLemmas> Analyses; struct TaggedLemmaForms { string lemma; TaggedForms forms; }; typedef vector<TaggedLemmaForms> TaggedLemmasForms; struct TokenRange { size_t start; size_t length; }; typedef vector<TokenRange> TokenRanges; struct DerivatedLemma { std::string lemma; }; typedef vector<DerivatedLemma> DerivatedLemmas;
16.2. Main Classes
class Version { public: unsigned major; unsigned minor; unsigned patch; string prerelease; static Version current(); }; class Tokenizer { public: virtual void setText(const char* text); virtual bool nextSentence(Forms* forms, TokenRanges* tokens); static Tokenizer* newVerticalTokenizer(); static Tokenizer* newCzechTokenizer(); static Tokenizer* newEnglishTokenizer(); static Tokenizer* newGenericTokenizer(); }; class TagsetConverter { public: static TagsetConverter* newIdentityConverter(); static TagsetConverter* newPdtToConll2009Converter(); static TagsetConverter* newStripLemmaCommentConverter(const Morpho& morpho); static TagsetConverter* newStripLemmaIdConverter(const Morpho& morpho); virtual void convert(TaggedLemma& lemma) const; virtual void convertAnalyzed(TaggedLemmas& lemmas) const; virtual void convertGenerated(TaggedLemmasForms& forms) const; }; class Derivator { public: virtual bool parent(const char* lemma, DerivatedLemma& parent) const; virtual bool children(const char* lemma, DerivatedLemmas& children) const; }; class DerivationFormatter { public: virtual string formatDerivation(const char* lemma) const; virtual void formatTaggedLemma(TaggedLemma& tagged_lemma, const TagsetConverter* converter = nullptr) const; virtual void formatTaggedLemmas(TaggedLemmas& tagged_lemma, const TagsetConverter* converter = nullptr) const; static DerivationFormatter* newNoneDerivationFormatter(); static DerivationFormatter* newRootDerivationFormatter(const Derivator* derivator); static DerivationFormatter* newPathDerivationFormatter(const Derivator* derivator); static DerivationFormatter* newTreeDerivationFormatter(const Derivator* derivator); static DerivationFormatter* newDerivationFormatter(const char* name, const Derivator* derivator); }; class Morpho { public: static Morpho* load(const char* fname); enum { NO_GUESSER = 0, GUESSER = 1, GUESSER_UNSPECIFIED = -1 }; virtual int analyze(const char* form, int guesser, TaggedLemmas& lemmas) const; virtual int generate(const char* lemma, const char* tag_wildcard, int guesser, TaggedLemmasForms& forms) const; virtual string rawLemma(const char* lemma) const; virtual string lemmaId(const char* lemma) const; virtual string rawForm(const char* form) const; virtual Tokenizer* newTokenizer() const; virtual Derivator* getDerivator() const; }; class Tagger { public: static Tagger* load(const char* fname); virtual const Morpho* getMorpho() const; virtual void tag(const Forms& forms, TaggedLemmas& tags, int guesser = Morpho::GUESSER_UNSPECIFIED) const; virtual void tagAnalyzed(const Forms& forms, const Analyses& analyses, Indices& tags) const; Tokenizer* newTokenizer() const; };
17. C# Bindings
MorphoDiTa library bindings is available in the Ufal.MorphoDiTa
namespace.
The bindings is a straightforward conversion of the C++
bindings API.
The bindings requires native C++ library libmorphodita_csharp
(called
morphodita_csharp
on Windows).
18. Java Bindings
MorphoDiTa library bindings is available in the cz.cuni.mff.ufal.morphodita
package.
The bindings is a straightforward conversion of the C++
bindings API.
Vectors do not have native Java interface, see
cz.cuni.mff.ufal.morphodita.Forms
class for reference. Also, class members
are accessible and modifiable using using getField
and setField
wrappers.
The bindings require native C++ library libmorphodita_java
(called
morphodita_java
on Windows). If the library is found in the current
directory, it is used, otherwise standard library search process is used.
The path to the C++ library can also be specified using static
morphodita_java.setLibraryPath(String path)
call (before the first call
inside the C++ library, of course).
19. Perl Bindings
MorphoDiTa library bindings is available in the
Ufal::MorphoDiTa
package.
The classes can be imported into the current namespace using the :all
export tag.
The bindings is a straightforward conversion of the C++
bindings API.
Vectors do not have native Perl interface, see Ufal::MorphoDiTa::Forms
for
reference. Static methods and enumerations are available only through the
module, not through object instance.
20. Python Bindings
MorphoDiTa library bindings is available in the
ufal.morphodita
module,
with binary wheels provided for Linux, Windows and OS X.
The bindings is a straightforward conversion of the C++
bindings API.
Only Python >=3 is supported.