Migrate model card from transformers-repo

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/savasy/bert-base-turkish-ner-cased/README.md

Files changed (1) hide show

README.md +38 -86

README.md CHANGED Viewed

@@ -1,18 +1,13 @@
-# How the model was trained
-This model is based on BERTurk
-https://huggingface.co/dbmdz/bert-base-turkish-cased
-## DataSet
-Training dataset is WikiAnn
-* The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using the linked entities in Wikipedia pages for 282 different languages including Danish. The dataset can be loaded with the DaNLP package:
-https://www.aclweb.org/anthology/P17-1178.pdf
-Thank to @stefan-it, I downloaded the data from the link as follows
-```
-mkdir tr-data
 cd tr-data
@@ -20,29 +15,24 @@ for file in train.txt dev.txt test.txt labels.txt
 do
   wget https://schweter.eu/storage/turkish-bert-wikiann/$file
 done
-```
-## Fine-tuning the bert-model
-The base bert model is dbmdz/bert-base-turkish-cased . With following system environment
 ```
 export MAX_LENGTH=128
-export BERT_MODEL=dbmdz/bert-base-turkish-cased
 export OUTPUT_DIR=tr-new-model
 export BATCH_SIZE=32
 export NUM_EPOCHS=3
 export SAVE_STEPS=625
 export SEED=1
 ```
-I run the following ner-training code(you can find it under transformer github repo)
 ```
-Then run training:
-python3 run_ner.py --data_dir ./tr-data3 \
 --model_type bert \
 --labels ./tr-data/labels.txt \
 --model_name_or_path $BERT_MODEL \
@@ -58,84 +48,46 @@ python3 run_ner.py --data_dir ./tr-data3 \
 --fp16
 ```
-If you dont have GPU-enabled computer, please skip last --fp16 parameter.
-Finally, you can find your trained model and model performance unde tr-new-model folder
-## Some Results
-###Performance for WikiANN dataset
-```
-cat tr-new-model-1/eval_results.txt
-cat tr-new-model-1/test_results.txt
-*Eval Results:*
-precision = 0.916400580551524
-recall = 0.9342309684101502
-f1 = 0.9252298787412536
-loss = 0.11335893666411284
-*Test Results:*
-precision = 0.9192058759362955
-recall = 0.9303010230367262
-f1 = 0.9247201697271198
-loss = 0.11182546521618497
-```
-### Performance with another dataset at the link
-https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt
-```
-savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat eval_results.txt
-precision = 0.9461980692049029
-recall = 0.959309358847465
-f1 = 0.9527086063783312
-loss = 0.037054269206847804
-savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat test_results.txt
-precision = 0.9458370635631155
-recall = 0.9588201928530913
-f1 = 0.952284378344882
-loss = 0.035431676572445225
-```
 # Usage
-You should install transformer library first
-```
-pip install transformers
-```
-And, open python environment and run the following code
 ```
 from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
 model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased")
 tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased")
 ner=pipeline('ner', model=model, tokenizer=tokenizer)
 ner("Mustafa Kemal Atatürk 19 Mayıs 1919'da Samsun'a ayak bastı.")
-[{'word': 'Mustafa', 'score': 0.9938516616821289, 'entity': 'B-PER'}, {'word': 'Kemal', 'score': 0.9881671071052551, 'entity': 'I-PER'}, {'word': 'Atatürk', 'score': 0.9957979321479797, 'entity': 'I-PER'}, {'word': 'Samsun', 'score': 0.9059973359107971, 'entity': 'B-LOC'}]
 ```

+---
+language: tr
+---
+# For Turkish language, here is an easy-to-use NER application.
+ ** Türkçe için kolay bir python  NER (Bert + Transfer Learning)  (İsim Varlık Tanıma) modeli...
+Thanks to @stefan-it, I applied the followings for training
 cd tr-data
 do
   wget https://schweter.eu/storage/turkish-bert-wikiann/$file
 done
+cd ..
+It will download the pre-processed datasets with training, dev and test splits and put them in a tr-data folder.
+Run pre-training
+After downloading the dataset, pre-training can be started. Just set the following environment variables:
 ```
 export MAX_LENGTH=128
+export BERT_MODEL=dbmdz/bert-base-turkish-cased
 export OUTPUT_DIR=tr-new-model
 export BATCH_SIZE=32
 export NUM_EPOCHS=3
 export SAVE_STEPS=625
 export SEED=1
 ```
+Then run pre-training:
 ```
+python3 run_ner_old.py --data_dir ./tr-data3 \
 --model_type bert \
 --labels ./tr-data/labels.txt \
 --model_name_or_path $BERT_MODEL \
 --fp16
 ```
 # Usage
 ```
 from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
 model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased")
 tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased")
 ner=pipeline('ner', model=model, tokenizer=tokenizer)
 ner("Mustafa Kemal Atatürk 19 Mayıs 1919'da Samsun'a ayak bastı.")
 ```
+# Some results
+Data1:  For the data above
+Eval Results:
+* precision = 0.916400580551524
+* recall = 0.9342309684101502
+* f1 = 0.9252298787412536
+* loss = 0.11335893666411284
+Test Results:
+* precision = 0.9192058759362955
+* recall = 0.9303010230367262
+* f1 = 0.9247201697271198
+* loss = 0.11182546521618497
+Data2:
+https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt
+The performance for the data given by @kemalaraz is as follows
+savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat eval_results.txt
+* precision = 0.9461980692049029
+* recall = 0.959309358847465
+* f1 = 0.9527086063783312
+* loss = 0.037054269206847804
+savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat test_results.txt
+* precision = 0.9458370635631155
+* recall = 0.9588201928530913
+* f1 = 0.952284378344882
+* loss = 0.035431676572445225