NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc.
NameTag 3 offers state-of-the-art or near state-of-the-art performance in English, German, Spanish, Dutch, Czech and Ukrainian.
NameTag is available in the following versions:
NameTag 3 is a free software under Mozilla Public License 2.0, and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.
Copyright 2024 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.
NameTag 3 can be used either as a commandline tool or by requesting the NameTag webservice:
NameTag 3 source code can be found at GitHub.
The individual models are described on the NameTag 3 Models webpage and are distributed via the LINDAT repository. The latest version is 240830
.
Corpus | NameTag 2 | NameTag 3 | NameTag 3 Model |
---|---|---|---|
CNEC 2.0 fine-grained (nested) | 83.44 | 86.39 | nametag3-czech-cnec2.0-240830 |
CNEC 2.0 coarse (nested) | 87.04 | 89.29 | nametag3-czech-cnec2.0-240830 |
English CoNLL-2003 (flat) | 91.68 | 93.85 | nametag3-multilingual-conll-240830 |
German CoNLL-2003 (flat) | 82.65 | 87.07 | nametag3-multilingual-conll-240830 |
Dutch CoNLL-2002 (flat) | 91.17 | 94.42 | nametag3-multilingual-conll-240830 |
Spanish CoNLL-2002 (flat) | 88.55 | 89.90 | nametag3-multilingual-conll-240830 |
Ukrainian Lang-uk (flat) | 88.73 | 91.73 | nametag3-multilingual-conll-240830 |
CNEC 2.0 CoNLL (4 labels, flat) | N/A | 86.35 | nametag3-multilingual-conll-240830 |
NameTag 3 is a free software under Mozilla Public License 2.0, and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.
The associated models and data are licensed under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.
If you use this tool for scientific work, please give us credit by referencing Straková et al. (2019) (see BibTeX for referencing).
Acknowledgements for the individual language models are listed in NameTag 3 Models page.
This work has been supported by the Grant Agency of the Czech Republic under the EXPRO program as project “LUSyD” (project No. GX20-16819X). The work described herein has also been using data provided by the LINDAT/CLARIAH-CZ Research Infrastructure, supported by the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2023062).
Straková Jana, Straka Milan, Hajič Jan: Neural Architectures for Nested NER through Linearization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Copyright © Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2, pp. 5326-5331, 2019.
@inproceedings{strakova-etal-2019-neural, title = "Neural Architectures for Nested {NER} through Linearization", author = "Strakov{\'a}, Jana and Straka, Milan and Hajic, Jan", editor = "Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'\i}s", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P19-1527", doi = "10.18653/v1/P19-1527", pages = "5326--5331", }
Authors: