Kyrgyz Named Entity Recognition
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language.WARNING: this model is not usable (see metrics below) and is built just as a proof of concept.
I'll update the model after cleaning up the Wikiann dataset (ky
part of it which contains only 100 train/test/valid items) or coming up with a completely new dataset.
Label ID and its corresponding label name
Label ID | Label Name |
---|---|
0 | O |
1 | B-PER |
2 | I-PER |
3 | B-ORG |
4 | I-ORG |
5 | B-LOC |
6 | I-LOC |
Results
Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
---|---|---|---|---|
Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 |
Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 |
Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 |
Example
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER")
model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Жусуп Мамай"
ner_results = nlp(example)
ner_results
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.