File size: 3,970 Bytes
c4b155e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd25542
c4b155e
807e394
 
 
 
 
c4b155e
 
 
c3711f2
 
 
 
 
 
 
 
fd25542
 
 
 
 
 
c4b155e
 
 
fd25542
c4b155e
fd25542
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4b155e
 
 
 
 
 
 
807e394
c4b155e
 
 
 
 
 
 
 
 
 
807e394
 
 
 
 
 
 
 
 
 
 
c4b155e
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
library_name: transformers
license: mit
base_model: dbmdz/bert-base-turkish-cased
tags:
- generated_from_trainer
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: bert-ner-turkish-cased
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# bert-ner-turkish-cased

This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) on a custom Turkish NER dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0987
- Precision: 0.9112
- Recall: 0.9364
- F1: 0.9236
- Accuracy: 0.9600

## Model description

This model identifies named entities in Turkish text:

```python
LABELS = [
    "O", "B-PER", "I-PER", "B-LOC", "I-LOC", "B-ORG", "I-ORG",
    "B-DATE", "I-DATE", "B-MONEY", "I-MONEY", "B-MISC", "I-MISC"
]
```
- PER: Person
- LOC: Location
- ORG: Organization
- DATE: Date
- MONEY: Money
- MISC: Miscellaneous Entities

## Intended uses & limitations

Extracting entities from Turkish text in NLP pipelines.

## How to Use

```python
from transformers import pipeline

model_name = "yeniguno/bert-ner-turkish-cased"

ner_pipeline = pipeline("ner", model=model_name, tokenizer=model_name, aggregation_strategy="simple")

text = """Selim Parlak, 2023-11-15 tarihinde, CUMHURİYET MAH. DUMAN SOKAK 22500 HAVSA/EDİRNE adresinden, Dünya Varlık Yönetim A.Ş. aracılığıyla 850 TRY değerindeki MP.2386.JPA.IP5.WHT.I İPHONE5 ŞARJLI KILIF "AİR" 1700 MAH (BEYAZ) ürününü satın aldı."""

results = ner_pipeline(text)

for result in results:
    print(result)

"""
{'entity_group': 'PER', 'score': 0.9993254, 'word': 'Selim Parlak', 'start': 0, 'end': 12}
{'entity_group': 'DATE', 'score': 0.9987677, 'word': '2023 - 11 - 15', 'start': 14, 'end': 24}
{'entity_group': 'LOC', 'score': 0.99951524, 'word': 'CUMHURİYET MAH. DUMAN SOKAK 22500 HAVSA / EDİRNE', 'start': 36, 'end': 82}
{'entity_group': 'ORG', 'score': 0.8487069, 'word': 'Dünya Varlık Yönetim A. Ş.', 'start': 95, 'end': 120}
{'entity_group': 'MONEY', 'score': 0.9970985, 'word': '850 TRY', 'start': 134, 'end': 141}
{'entity_group': 'MISC', 'score': 0.97721404, 'word': 'MP. 2386. JPA. IP5. WHT. I İPHONE5 ŞARJLI KILIF " AİR " 1700 MAH ( BEYAZ )', 'start': 154, 'end': 219}
"""
```


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:-----:|:---------------:|:---------:|:------:|:------:|:--------:|
| 0.1351        | 1.0   | 1527  | 0.1158          | 0.8592    | 0.9070 | 0.8825 | 0.9517   |
| 0.1088        | 2.0   | 3054  | 0.1045          | 0.8787    | 0.9336 | 0.9053 | 0.9574   |
| 0.1016        | 3.0   | 4581  | 0.0993          | 0.8901    | 0.9280 | 0.9086 | 0.9576   |
| 0.1102        | 4.0   | 6108  | 0.0963          | 0.8991    | 0.9277 | 0.9132 | 0.9587   |
| 0.0877        | 5.0   | 7635  | 0.0953          | 0.9046    | 0.9292 | 0.9167 | 0.9584   |
| 0.0933        | 6.0   | 9162  | 0.0939          | 0.9036    | 0.9321 | 0.9176 | 0.9593   |
| 0.0827        | 7.0   | 10689 | 0.0967          | 0.8986    | 0.9398 | 0.9188 | 0.9605   |
| 0.0933        | 8.0   | 12216 | 0.0949          | 0.9122    | 0.9292 | 0.9206 | 0.9593   |
| 0.084         | 9.0   | 13743 | 0.0987          | 0.9112    | 0.9364 | 0.9236 | 0.9600   |


### Framework versions

- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0