Edit model card

bert-base-turkish-uncased-ner

This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2603
  • F1: 0.7821

Model description

This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. The training dataset consists of 18,967 samples, and the validation dataset consists of 1,000 samples, both derived from Wikipedia data.

For more detailed information, please visit this link: https://huggingface.co/datasets/turkish-nlp-suite/turkish-wikiNER

Labels:

  • CARDINAL
  • DATE
  • EVENT
  • FAC
  • GPE
  • LANGUAGE
  • LAW
  • LOC
  • MONEY
  • NORP
  • ORDINAL
  • ORG
  • PERCENT
  • PERSON
  • PRODUCT
  • QUANTITY
  • TIME
  • TITLE
  • WORK_OF_ART

Fine-Tuning Process : https://github.com/saribasmetehan/bert-base-turkish-uncased-ner

Example

from transformers import pipeline
import pandas as pd

text = "Bu toplam sıfır ise, Newton'ın birinci yasası cismin hareket durumunun değişmeyeceğini söyler."
model_id = "saribasmetehan/bert-base-turkish-uncased-ner"
ner = pipeline("ner",model = model_id)
preds= ner(text, aggregation_strategy = "simple")

pd.DataFrame(preds)

Load model directly

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_name = "saribasmetehan/bert-base-turkish-uncased-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss F1
0.4 1.0 1186 0.2502 0.7703
0.2227 2.0 2372 0.2439 0.7740
0.1738 3.0 3558 0.2511 0.7783
0.1474 4.0 4744 0.2603 0.7821

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
98
Safetensors
Model size
110M params
Tensor type
F32
·

Finetuned from

Evaluation results