Turkish Named Entity Recognition (NER) Model

This model is the fine-tuned model of dbmdz/convbert-base-turkish-cased (ConvBERTurk) using a reviewed version of well known Turkish NER dataset

(https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt).

The ConvBERT architecture is presented in the "ConvBERT: Improving BERT with Span-based Dynamic Convolution" paper.

Fine-tuning parameters:

task = "ner"
model_checkpoint = "dbmdz/convbert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01 

How to use:

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/convbert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/convbert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("<your text here>")

# Pls refer "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with aggregation_strategy parameter.

Reference test results:

  • accuracy: 0.9937648915431506
  • f1: 0.9610945644080416
  • precision: 0.9619899385131359
  • recall: 0.9602008554956295
Downloads last month
1,367
Safetensors
Model size
107M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using akdeniz27/convbert-base-turkish-cased-ner 4