File size: 2,024 Bytes
ec15925 eec1a9a e6c62a9 ec15925 2ef6220 22fb2c9 9185cf4 ce2a5c9 b72b769 faa072f eec1a9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
license: cc-by-4.0
language:
- kk
metrics:
- f1
- precision
- recall
pipeline_tag: token-classification
tags:
- NER
- Named Entity Recognition
widget:
- text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy."
example_title: "Sentiment analysis"
- text: "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had ..."
example_title: "Coreference resolution"
---
# A Named Entity Recognition Model for Kazakh
- The model was inspired by the [LREC 2022](https://lrec2022.lrec-conf.org/en/) paper [*KazNERD: Kazakh Named Entity Recognition Dataset*](https://aclanthology.org/2022.lrec-1.44).
- The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
## Differences
While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens and duplicates were removed.
As a result, the number of sentences, tokens, and named entities (NEs) in the cleaned dataset changed. It is also likely that token numbers were calculated incorrectly in the original dataset and should have been given as 1,120,387 (Train), 136,983 (Valid), 134,540 (Test), and 1,391,910 (Total).
Dataset | Unit | Train | Valid | Test | Total |
| :---: | :---: | :---: | :---: | :---: | :---: |
KazNERD (Original)| Sentence | 90,228 (80.06%) | 11,167 (9.91%)| 11,307 (10.03%) | 112,702 (100%) |
KazNERD (Cleaned) | Sentence | 88,540 (80.00%) | 11,067 (10.00%) | 11,068 (10.00%) | 110,675 (100%) |
KazNERD (Original)| Token | 1,043,305 (80.11%) | 129,223 (9.92%)| 129,824 (9.97%) | 1,302,352 (100%) |
KazNERD (Cleaned) | Token | 1,088,461 (80.04%) | 136,021 (10.00%) | 135,426 (9.96%) | 1,359,908 (100%) |
KazNERD (Original)| NE | 109,342 (80.20%) | 13,483 (9.89%)| 13,508 (9.91%) | 136,333 (100%) |
KazNERD (Cleaned) | NE | 106,148 (80.17%) | 13,189 (9.96%) | 13,072 (9.87%) | 132,409 (100%) | |