saribasmetehan
/

bert-base-turkish-uncased-ner

Token Classification

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

bert-base-turkish-uncased-ner / README.md

saribasmetehan's picture

Update README.md

1205fa4 verified 11 months ago

|

history blame contribute delete

3.35 kB

	---
	license: mit
	base_model: dbmdz/bert-base-turkish-uncased
	tags:
	- generated_from_trainer
	datasets:
	- turkish-wiki_ner
	metrics:
	- f1
	model-index:
	- name: bert-base-turkish-uncased-ner
	results:
	- task:
	name: Token Classification
	type: token-classification
	dataset:
	name: turkish-wiki_ner
	type: turkish-wiki_ner
	config: turkish-WikiNER
	split: validation
	args: turkish-WikiNER
	metrics:
	- name: F1
	type: f1
	value: 0.7821495486288537
	language:
	- tr
	widget:
	- text: "Leblebi Mehmet adıyla Galatasarayın sembol futbolcularından oldu."
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bert-base-turkish-uncased-ner

	This model is a fine-tuned version of [dbmdz/bert-base-turkish-uncased](https://huggingface.co/dbmdz/bert-base-turkish-uncased) on the turkish-wiki_ner dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2603
	- F1: 0.7821

	## Model description

	This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. The training dataset consists of 18,967 samples, and the validation dataset consists of 1,000 samples, both derived from Wikipedia data.


	For more detailed information, please visit this link: https://huggingface.co/datasets/turkish-nlp-suite/turkish-wikiNER
	-
	Labels:

	<ul>
	<li>CARDINAL</li>
	<li>DATE</li>
	<li>EVENT</li>
	<li>FAC</li>
	<li>GPE</li>
	<li>LANGUAGE</li>
	<li>LAW</li>
	<li>LOC</li>
	<li>MONEY</li>
	<li>NORP</li>
	<li>ORDINAL</li>
	<li>ORG</li>
	<li>PERCENT</li>
	<li>PERSON</li>
	<li>PRODUCT</li>
	<li>QUANTITY</li>
	<li>TIME</li>
	<li>TITLE</li>
	<li>WORK_OF_ART</li>
	</ul>

	Fine-Tuning Process : https://github.com/saribasmetehan/bert-base-turkish-uncased-ner
	-
	## Example
	```markdown
	from transformers import pipeline
	import pandas as pd

	text = "Bu toplam sıfır ise, Newton'ın birinci yasası cismin hareket durumunun değişmeyeceğini söyler."
	model_id = "saribasmetehan/bert-base-turkish-uncased-ner"
	ner = pipeline("ner",model = model_id)
	preds= ner(text, aggregation_strategy = "simple")

	pd.DataFrame(preds)

	```

	## Load model directly
	```markdown
	from transformers import AutoModelForTokenClassification, AutoTokenizer

	model_name = "saribasmetehan/bert-base-turkish-uncased-ner"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	```
	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 0.4 \| 1.0 \| 1186 \| 0.2502 \| 0.7703 \|
	\| 0.2227 \| 2.0 \| 2372 \| 0.2439 \| 0.7740 \|
	\| 0.1738 \| 3.0 \| 3558 \| 0.2511 \| 0.7783 \|
	\| 0.1474 \| 4.0 \| 4744 \| 0.2603 \| 0.7821 \|


	### Framework versions

	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.2
	- Tokenizers 0.19.1