Tamil Named Entity Recognition
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Tamil language.
Label ID and its corresponding label name
Label ID |
Label Name |
0 |
O |
1 |
B-PER |
2 |
I-PER |
3 |
B-ORG |
4 |
I-ORG |
5 |
B-LOC |
6 |
I-LOC |
Results
Step |
Training Loss |
Validation Loss |
Overall Precision |
Overall Recall |
Overall F1 |
Overall Accuracy |
Loc F1 |
Org F1 |
Per F1 |
1000 |
0.386900 |
0.300006 |
0.833469 |
0.824748 |
0.829086 |
0.912857 |
0.835343 |
0.781625 |
0.867752 |
2000 |
0.210200 |
0.251389 |
0.845455 |
0.842052 |
0.843750 |
0.924861 |
0.851711 |
0.790198 |
0.886515 |
3000 |
0.140000 |
0.264964 |
0.866952 |
0.856137 |
0.861510 |
0.930141 |
0.874872 |
0.818150 |
0.885203 |
4000 |
0.095400 |
0.298542 |
0.860871 |
0.882696 |
0.871647 |
0.935692 |
0.881348 |
0.829285 |
0.899245 |
5000 |
0.062200 |
0.296011 |
0.871805 |
0.878471 |
0.875125 |
0.938806 |
0.875434 |
0.850889 |
0.898148 |
6000 |
0.042200 |
0.320418 |
0.868416 |
0.879074 |
0.873713 |
0.937497 |
0.877588 |
0.833611 |
0.907737 |
Example
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Ambareeshkumar/BERT-Tamil")
model = AutoModelForTokenClassification.from_pretrained("Ambareeshkumar/BERT-Tamil")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "இந்திய"
ner_results = nlp(example)
ner_results