julien-c HF staff commited on
Commit
3b8275c
·
1 Parent(s): 0fb364d

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/savasy/bert-turkish-text-classification/README.md

Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tr
3
+ ---
4
+
5
+ # Turkish Text Classification
6
+
7
+ This model is a fine-tune model of https://github.com/stefan-it/turkish-bert by using text classification data where there are 7 categories as follows
8
+
9
+ ```
10
+ code_to_label={
11
+ 'LABEL_0': 'dunya ',
12
+ 'LABEL_1': 'ekonomi ',
13
+ 'LABEL_2': 'kultur ',
14
+ 'LABEL_3': 'saglik ',
15
+ 'LABEL_4': 'siyaset ',
16
+ 'LABEL_5': 'spor ',
17
+ 'LABEL_6': 'teknoloji '}
18
+
19
+ ```
20
+
21
+
22
+ ## Data
23
+ The following Turkish benchmark dataset is used for fine-tuning
24
+
25
+ https://www.kaggle.com/savasy/ttc4900
26
+
27
+ ## Quick Start
28
+
29
+ Bewgin with installing transformers as follows
30
+ > pip install transformers
31
+
32
+ ```
33
+ # Code:
34
+ # import libraries
35
+ from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer, AutoModelForSequenceClassification
36
+ tokenizer= AutoTokenizer.from_pretrained("savasy/bert-turkish-text-classification")
37
+
38
+ # build and load model, it take time depending on your internet connection
39
+ model= AutoModelForSequenceClassification.from_pretrained("savasy/bert-turkish-text-classification")
40
+
41
+ # make pipeline
42
+ nlp=pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
43
+
44
+ # apply model
45
+ nlp("bla bla")
46
+ # [{'label': 'LABEL_2', 'score': 0.4753005802631378}]
47
+
48
+ code_to_label={
49
+ 'LABEL_0': 'dunya ',
50
+ 'LABEL_1': 'ekonomi ',
51
+ 'LABEL_2': 'kultur ',
52
+ 'LABEL_3': 'saglik ',
53
+ 'LABEL_4': 'siyaset ',
54
+ 'LABEL_5': 'spor ',
55
+ 'LABEL_6': 'teknoloji '}
56
+
57
+ code_to_label[nlp("bla bla")[0]['label']]
58
+ # > 'kultur '
59
+ ```
60
+
61
+ ## How the model was trained
62
+
63
+ ```
64
+
65
+ ## loading data for Turkish text classification
66
+ import pandas as pd
67
+ # https://www.kaggle.com/savasy/ttc4900
68
+ df=pd.read_csv("7allV03.csv")
69
+ df.columns=["labels","text"]
70
+ df.labels=pd.Categorical(df.labels)
71
+
72
+ traind_df=...
73
+ eval_df=...
74
+
75
+ # model
76
+ from simpletransformers.classification import ClassificationModel
77
+ import torch,sklearn
78
+
79
+ model_args = {
80
+ "use_early_stopping": True,
81
+ "early_stopping_delta": 0.01,
82
+ "early_stopping_metric": "mcc",
83
+ "early_stopping_metric_minimize": False,
84
+ "early_stopping_patience": 5,
85
+ "evaluate_during_training_steps": 1000,
86
+ "fp16": False,
87
+ "num_train_epochs":3
88
+ }
89
+
90
+ model = ClassificationModel(
91
+ "bert",
92
+ "dbmdz/bert-base-turkish-cased",
93
+ use_cuda=cuda_available,
94
+ args=model_args,
95
+ num_labels=7
96
+ )
97
+ model.train_model(train_df, acc=sklearn.metrics.accuracy_score)
98
+ ```
99
+ For other training models please check https://simpletransformers.ai/
100
+
101
+
102
+ For the detailed usage of Turkish Text Classification please check [python notebook](https://github.com/savasy/TurkishTextClassification/blob/master/Bert_base_Text_Classification_for_Turkish.ipynb)