metadata

license: mit
base_model: dbmdz/bert-base-turkish-uncased
tags:
  - generated_from_trainer
datasets:
  - turkish-wiki_ner
metrics:
  - f1
model-index:
  - name: bert-base-turkish-uncased-ner
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: turkish-wiki_ner
          type: turkish-wiki_ner
          config: turkish-WikiNER
          split: validation
          args: turkish-WikiNER
        metrics:
          - name: F1
            type: f1
            value: 0.7821495486288537
language:
  - tr
widget:
  - text: Leblebi Mehmet adıyla Galatasarayın sembol futbolcularından oldu.

bert-base-turkish-uncased-ner

This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. It achieves the following results on the evaluation set:

Loss: 0.2603
F1: 0.7821

Model description

This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. The training dataset consists of 18,967 samples, and the validation dataset consists of 1,000 samples, both derived from Wikipedia data.

For more detailed information, please visit this link: https://huggingface.co/datasets/turkish-nlp-suite/turkish-wikiNER

Labels:

CARDINAL
DATE
EVENT
FAC
GPE
LANGUAGE
LAW
LOC
MONEY
NORP
ORDINAL
ORG
PERCENT
PERSON
PRODUCT
QUANTITY
TIME
TITLE
WORK_OF_ART

Fine-Tuning Process : https://github.com/saribasmetehan/bert-base-turkish-uncased-ner

Example

from transformers import pipeline
import pandas as pd

text = "Bu toplam sıfır ise, Newton'ın birinci yasası cismin hareket durumunun değişmeyeceğini söyler."
model_id = "saribasmetehan/bert-base-turkish-uncased-ner"
ner = pipeline("ner",model = model_id)
preds= ner(text, aggregation_strategy = "simple")

pd.DataFrame(preds)

Load model directly

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_name = "saribasmetehan/bert-base-turkish-uncased-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	F1
0.4	1.0	1186	0.2502	0.7703
0.2227	2.0	2372	0.2439	0.7740
0.1738	3.0	3558	0.2511	0.7783
0.1474	4.0	4744	0.2603	0.7821

Framework versions

Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1