The latest version of the Czech Named Entity Corpus (Czech Named Entity Corpus 2.0) is a corpus of 8993 Czech sentences with manually annotated 35220 Czech named entities.
The corpus uses 46 atomic named entity types, which can be embedded, e.g., the
river name can be part of a name of a city as in <gu Ústí nad <gh
Labem>>
. There are also 4 so-called NE containers: two or more NEs
are parts of a NE container (e.g., two NEs, a first name and a surname, form
together a person name NE container such as in <P <pf Jan><ps
Novák>>
). The 4 NE containers are marked with a capital one-letter
tag: P
for (complex) person names, T
for temporal
expressions, A
for addresses, and C
for bibliographic
items.
Current version download: Czech Named Entity Corpus 2.0.
Detailed description of the corpus, file formats, two-level named entity hierarchy and download links are available for every released version:
CNEC 1.0 Types | CNEC 1.0 Supertypes | CNEC 2.0 Types | CNEC 2.0 Supertypes | CNEC 1.0 Extended | CNEC 2.0 Extended | System | Code | Method |
---|---|---|---|---|---|---|---|---|
– | – | 86.39 | 89.29 | – | – | NameTag 3 (Straková et al., 2019) | GitHub | Seq2seq+fine-tuned RobeCzech |
– | – | – | – | – | 86.39 | Bachelor Thesis of Müller 2020, a rerun of Straková et al., 2019 | Straková et al., 2019 | LSTM-CRF+BERT |
86.88 | 89.91 |
|
|
– | – | Straka et al., 2019 | – | Seq2seq+BERT |
86.88 | – | – | – | – | – | Straková et al., 2019 | GitHub | Seq2seq+BERT |
83.15 | 86.30 | – | – | 83.27 | 84.22 | Hluboké učení v automatické analýze českého textu. In: Slovo a slovesnost, ISSN 0037-7031, vol. 80, no. 4, pp. 306-327 | – | Deep NN |
– | – | – | – | – | 81.05 | Güngör, 2018 | – | RNN+WE+CLE |
81.20 | 84.68 | 79.23 | 82.78 | 80.88 | 80.79 | Straková et al., 2016 | GitHub | RNN+WE+CLE |
– | – | – | – | 74.08 | – | Konkol et al., 2015 | – | Latent semantics |
– | – | – | – | 75.61 | – | Demir and Özgür, 2014 | – | NN+WE |
– | – | – | – | 74.23 | 74.37 | Konkol and Konopík, 2014 | – | CRF+stemming |
79.23 | 82.82 | – | – | – | – | Straková et al., 2013 | NameTag 1 | Simple NN |
– | 79.00 | – | – | 74.08 | – | Konkol and Konopík, 2013 | – | CRF |
– | 72.94 | – | – | – | – | Konkol and Konopík, 2011 | – | Maximum entropy |
68.00 | 71.00 | – | – | – | – | Kravalová and Žabokrtský, 2009 | – | SVM |
62.00 | 68.00 | – | – | – | – | Ševčíková et al., 2007 | – | Dec. trees |
Please let us know if you would like to be featured on this leaderboard. Thank you!
Please let us know if you would like your tool to be added to the list.
Ševčíková, M., Žabokrtský, Z., Krůza, O.: Named Entities in Czech: Annotating Data and Developing NE Tagger. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 188–195. Springer, Heidelberg (2007).
@inproceedings{SevcikovaEtAl2007CNEC,
booktitle = {Lecture Notes in Artificial Intelligence, Proceedings of the 10th International Conference on Text, Speech and Dialogue},
series = {Lecture Notes in Computer Science},
title = {Named Entities in Czech: Annotating Data and Developing {NE} Tagger},
editor = {V{\'{a}}clav Matou{\v{s}}ek and Pavel Mautner},
author = {Magda {\v{S}}ev{\v{c}}{\'{\i}}kov{\'{a}} and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}} and Old{\v{r}}ich Kr{\r{u}}za},
year = {2007},
publisher = {Springer},
address = {Berlin / Heidelberg},
volume = {4629},
number = {{XVII}},
pages = {188--195},
isbn = {978-3-540-74627-0},
issn = {0302-9743},
}