File size: 1,832 Bytes

d4ffbcb
3e71317
 
 
d4ffbcb
3e71317
2d39c5c
3e71317
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a9418f
3e71317
 
 
 
 
3a9418f
 
 
 
 
 
 
eafa088
 
41c14ff
eafa088

---
language: tr
datasets:
- SUNLP-NER-Twitter
---

# bert-loodos-sunlp-ner-turkish

## Introduction
[bert-loodos-sunlp-ner-turkish] is a NER model that was fine-tuned from the loodos/bert-base-turkish-cased model on the SUNLP-NER-Twitter dataset. 

## Training data
The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset
Named entity types are as follows:
Person, Location, Organization, Time, Money, Product, TV-Show


## How to use bert-loodos-sunlp-ner-turkish with HuggingFace

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("busecarik/bert-loodos-sunlp-ner-turkish")
model = AutoModelForTokenClassification.from_pretrained("busecarik/bert-loodos-sunlp-ner-turkish")
```

## Model performances on SUNLP-NER-Twitter test set (metric: seqeval)
Precision|Recall|F1
-|-|-
84.66|84.36|84.51

Classification Report

Entity|Precision|Recall|F1
-|-|-|-
LOCATION|0.74|0.78|0.76
MONEY|0.93|0.82|0.87
ORGANIZATION|0.83|0.81|0.82
PERSON|0.90|0.92|0.91
PRODUCT|0.55|0.50|0.52
TIME|0.91|0.87|0.89
TVSHOW|0.63|0.58|0.54


You can cite the following [paper](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.484.pdf), if you use this model:

```bibtex
@InProceedings{ark-yeniterzi:2022:LREC,
  author    = {\c{C}ar\i k, Buse  and  Yeniterzi, Reyyan},
  title     = {A Twitter Corpus for Named Entity Recognition in Turkish},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {4546--4551},
  url       = {https://aclanthology.org/2022.lrec-1.484}
}
```