OS: 
Linux
Tags: 

NameTag 3

1. Introduction

NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc.

NameTag 3 offers state-of-the-art or near state-of-the-art performance in English, German, Spanish, Dutch, Czech and Ukrainian.

NameTag is available in the following versions:

NameTag 3 is a free software under Mozilla Public License 2.0, and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.

Copyright 2024 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.

2. Current Release

NameTag 3 can be used either as a commandline tool or by requesting the NameTag webservice:

NameTag 3 source code can be found at GitHub.

3. Models

The individual models are described on the NameTag 3 Models webpage and are distributed via the LINDAT repository.

4. Results at a Glance

Corpus NameTag 2 NameTag 3 NameTag 3 Model
CNEC 2.0 fine-grained (nested) 83.44 86.39 nametag3-czech-cnec2.0-240830
Arabic CoNLL-2012 OntoNotes v5 - 74.20 nametag3-multilingual-250203
Chinese CoNLL-2012 OntoNotes v5 - 81.63 nametag3-multilingual-250203
Chinese UNER GSDSIMP - 90.99 nametag3-multilingual-250203
Chinese UNER GSD - 91.53 nametag3-multilingual-250203
Croatian UNER SET - 95.55 nametag3-multilingual-250203
Czech CNEC 2.0 CoNLL (4 labels, flat) - 86.24 nametag3-multilingual-250203
Danish UNER DDT - 89.75 nametag3-multilingual-250203
Dutch CoNLL-2002 91.17 94.93 nametag3-multilingual-250203
English CoNLL2012 OntoNotes v5 - 90.19 nametag3-multilingual-250203
English UNER EWT - 87.03 nametag3-multilingual-250203
English CoNLL-2003 91.68 94.03 nametag3-multilingual-250203
German CoNLL-2003 82.65 87.48 nametag3-multilingual-250203
Maghrebi Arabic French UNER Arabizi - 84.49 nametag3-multilingual-250203
Norwegian bokmaal UNER NDT - 95.83 nametag3-multilingual-250203
Norwegian nynorsk UNER NDT - 94.51 nametag3-multilingual-250203
Portuguese UNER Bosque - 90.89 nametag3-multilingual-250203
Serbian UNER SET - 97.10 nametag3-multilingual-250203
Slovak UNER SNK - 88.46 nametag3-multilingual-250203
Spanish CoNLL-2002 88.55 90.29 nametag3-multilingual-250203
Swedish UNER Talbanken - 91.79 nametag3-multilingual-250203
Ukrainian Lang-uk 88.73 92.88 nametag3-multilingual-250203

5. License

NameTag 3 is a free software under Mozilla Public License 2.0, and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.

The associated models and data are licensed under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.

If you use this tool for scientific work, please give us credit by referencing Straková et al. (2019) (see BibTeX for referencing).

6. Acknowledgements

Acknowledgements for the individual language models are listed in NameTag 3 Models page.

This work has been supported by the MŠMT OP JAK program, project No. CZ.02.01.01/00/22_008/0004605 and by the Grant Agency of the Czech Republic under the EXPRO program as project “LUSyD” (project No. GX20-16819X). The work described herein has also been using data provided by the [LINDAT/CLARIAH-CZ Research Infrastructure https://lindat.cz], supported by the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2023062).

6.1. Publications

Straková Jana, Straka Milan, Hajič Jan: Neural Architectures for Nested NER through Linearization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Copyright © Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2, pp. 5326-5331, 2019.

@inproceedings{strakova-etal-2019-neural,
    title = "Neural Architectures for Nested {NER} through Linearization",
    author = "Strakov{\'a}, Jana  and
      Straka, Milan  and
      Hajic, Jan",
    editor = "Korhonen, Anna  and
      Traum, David  and
      M{\`a}rquez, Llu{\'\i}s",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1527",
    doi = "10.18653/v1/P19-1527",
    pages = "5326--5331",
}

7. Contact

Authors:

NameTag website.

Screenshot: