bezir commited on
Commit
3c5fc86
·
verified ·
1 Parent(s): b115e30

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +204 -168
README.md CHANGED
@@ -1,199 +1,235 @@
1
  ---
 
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
 
 
13
 
14
- ### Model Description
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
 
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
 
 
 
 
 
 
29
 
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
 
35
 
36
- ## Uses
 
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
39
 
40
- ### Direct Use
 
 
 
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
 
 
 
 
 
 
 
 
 
 
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
 
 
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
55
 
56
- [More Information Needed]
57
 
58
- ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
 
 
61
 
62
- [More Information Needed]
 
63
 
64
- ### Recommendations
 
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
+ license: gemma
3
  library_name: transformers
4
+ pipeline_tag: text-generation
5
+ extra_gated_heading: Access Gemma on Hugging Face
6
+ extra_gated_prompt: >-
7
+ To access Gemma on Hugging Face, you’re required to review and agree to
8
+ Google’s usage license. To do this, please ensure you’re logged in to Hugging
9
+ Face and click below. Requests are processed immediately.
10
+ extra_gated_button_content: Acknowledge license
11
+ tags:
12
+ - conversational
13
+ base_model:
14
+ - google/gemma-2-9b
15
+ language:
16
+ - tr
17
+
18
+ model-index:
19
+ - name: gemma-2-9b-it-tr
20
+ results:
21
+ - task:
22
+ type: multiple-choice
23
+ dataset:
24
+ type: multiple-choice
25
+ name: MMLU_TR_V0.2
26
+ metrics:
27
+ - name: 5-shot
28
+ type: 5-shot
29
+ value: 0.5982
30
+ verified: false
31
+ - task:
32
+ type: multiple-choice
33
+ dataset:
34
+ type: multiple-choice
35
+ name: Truthful_QA_V0.2
36
+ metrics:
37
+ - name: 0-shot
38
+ type: 0-shot
39
+ value: 0.4991
40
+ verified: false
41
+ - task:
42
+ type: multiple-choice
43
+ dataset:
44
+ type: multiple-choice
45
+ name: ARC_TR_V0.2
46
+ metrics:
47
+ - name: 25-shot
48
+ type: 25-shot
49
+ value: 0.5367
50
+ verified: false
51
+ - task:
52
+ type: multiple-choice
53
+ dataset:
54
+ type: multiple-choice
55
+ name: HellaSwag_TR_V0.2
56
+ metrics:
57
+ - name: 10-shot
58
+ type: 10-shot
59
+ value: 0.5701
60
+ verified: false
61
+ - task:
62
+ type: multiple-choice
63
+ dataset:
64
+ type: multiple-choice
65
+ name: GSM8K_TR_V0.2
66
+ metrics:
67
+ - name: 5-shot
68
+ type: 5-shot
69
+ value: 0.6682
70
+ verified: false
71
+ - task:
72
+ type: multiple-choice
73
+ dataset:
74
+ type: multiple-choice
75
+ name: Winogrande_TR_V0.2
76
+ metrics:
77
+ - name: 5-shot
78
+ type: 5-shot
79
+ value: 0.6058
80
+ verified: false
81
  ---
82
 
83
+ <img src="https://huggingface.co/WiroAI/gemma-2-9b-it-tr/blob/main/wiro_logo.png"/>
84
 
 
85
 
86
+ # 🌟 Meet with WiroAI/gemma-2-9b-it-tr! A robust language model with more Turkish language and culture support! 🌟
87
 
88
+ ## 🌟 Key Features
89
 
90
+ Fine-tuned with 500,000+ high-quality Turkish instructions
91
+ Adapted to Turkish culture and local context
92
+ Built on Google's cutting-edge Gemma architecture
93
 
94
+ 📝 Model Details
95
+ Gemma-2-9b-it-tr is the Turkish-speaking member of Google's innovative Gemma model family. This model has been trained using Supervised Fine-Tuning (SFT) on carefully curated high-quality Turkish instructions. Leveraging the foundations of Gemini technology, this model demonstrates superior performance in Turkish language processing tasks.
96
 
97
+ ## 🔧 Technical Specifications
98
 
99
+ Architecture: Decoder-only transformer
100
+ Base Model: Google Gemma 2 9B
101
+ Training Data: 500,000+ specially selected Turkish instructions
102
+ Language Support: Turkish (with comprehensive local context understanding) and other common languages.
103
 
104
+ ## 💡 Use Cases
 
 
 
 
 
 
105
 
106
+ Text Generation and Editing
107
+ Question Answering
108
+ Summarization
109
+ Analysis and Reasoning
110
+ Content Transformation
111
+ Turkish Natural Language Processing Tasks
112
+ Turkish Culture
113
 
114
+ ## 🚀 Advantages
115
 
116
+ Local Understanding: Ability to comprehend Turkish culture, idioms, and current events
117
+ Resource Efficiency: Effective operation even with limited hardware resources
118
+ Flexible Deployment: Usable on desktop, laptop, or custom cloud infrastructure
119
+ Open Model: Transparent and customizable architecture
120
 
121
+ ## 🌍 About Google Gemma 2
122
+ Gemma is Google's family of lightweight, state-of-the-art open models, developed using the same research and technology used to create the Gemini models. These models are designed to be deployable in environments with limited resources, making AI technology accessible to everyone.
123
 
124
+ ## 📈 Performance and Limitations
125
+ While the model demonstrates high performance in Turkish language tasks, users should consider the following:
126
 
127
+ Use clear and structured instructions for best results
128
+ Verify model outputs for critical applications
129
+ Evaluate resource requirements before deployment
130
+ Be aware that benchmarks below
131
 
132
+ ### Benchmark Scores
133
 
 
134
 
135
+ | Models | MMLU TR | TruthfulQA TR | ARC TR | HellaSwag TR | GSM8K TR | WinoGrande TR | Average |
136
+ |-----------------------------------------------------------|:-------:|:-------------:|:------:|:------------:|:--------:|:-------------:|:-------:|
137
+ | WiroAI/gemma-2-9b-it-tr | 59.8 | 49.9 | 53.7 | 57.0 | 66.8 | 60.6 | 58.0 |
138
+ | selimc/OrpoGemma-2-9B-TR | 53.0 | 54.3 | 52.4 | 52.0 | 64.8 | 58.9 | 55.9 |
139
+ | Metin/Gemma-2-9b-it-TR-DPO-V1 | 51.3 | 54.7 | 52.6 | 51.2 | 67.1 | 55.2 | 55.4 |
140
+ | CohereForAI/aya-expanse-8b | 52.3 | 52.8 | 49.3 | 56.7 | 61.3 | 59.2 | 55.3 |
141
+ | ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 52.0 | 57.6 | 51.0 | 53.0 | 59.8 | 58.0 | 55.2 |
142
+ | google/gemma-2-9b-it | 51.8 | 53.0 | 52.2 | 51.5 | 63.0 | 56.2 | 54.6 |
143
+ | Eurdem/Defne-llama3.1-8B | 52.9 | 51.2 | 47.1 | 51.6 | 59.9 | 57.5 | 53.4 |
144
+ | meta-llama/Meta-Llama-3-8B-Instruct | 52.2 | 49.2 | 44.2 | 49.2 | 56.0 | 56.7 | 51.3 |
145
+ | WiroAI/Llama-3.1-8b-instruct-tr | 52.2 | 49.2 | 44.2 | 49.2 | 56.0 | 56.7 | 51.3 |
146
 
147
+ Models Benchmarks are tested with
148
+ ```python
149
+ lm_eval --model_args pretrained=<model_path> --tasks mmlu_tr_v0.2,arc_tr-v0.2,gsm8k_tr-v0.2,hellaswag_tr-v0.2,truthfulqa_v0.2,winogrande_tr-v0.2
150
+ ```
151
+ Please see https://github.com/malhajar17/lm-evaluation-harness_turkish and note that we move forward with default language inference which is the same approach in OpenLLMLeaderboard v2.0
152
 
153
+ ## Usage
154
 
155
+ ### Transformers Pipeline
156
 
157
+ ```python
158
+ import transformers
159
+ import torch
160
 
 
161
 
162
+ model_id = "WiroAI/gemma-2-9b-it-tr"
163
 
164
+ pipeline = transformers.pipeline(
165
+ "text-generation",
166
+ model=model_id,
167
+ model_kwargs={"torch_dtype": torch.bfloat16},
168
+ device_map="auto",
169
+ )
170
 
171
+ pipeline.model.eval()
172
+ instruction = "Bana İstanbul ile alakalı bir sosyal medya postu hazırlar mısın?"
173
 
174
+ messages = [
175
+ {"role": "user", "content": f"{instruction}"}
176
+ ]
177
 
178
+ prompt = pipeline.tokenizer.apply_chat_template(
179
+ messages,
180
+ tokenize=False,
181
+ add_generation_prompt=True
182
+ )
183
+
184
+ terminators = [
185
+ pipeline.tokenizer.eos_token_id,
186
+ pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
187
+ ]
188
+
189
+ outputs = pipeline(
190
+ prompt,
191
+ max_new_tokens=512,
192
+ eos_token_id=terminators,
193
+ do_sample=True,
194
+ temperature=0.9,
195
+ )
196
+
197
+ print(outputs[0]["generated_text"][len(prompt):])
198
+ ```
199
+
200
+ ```markdown
201
+ İstanbul'un büyüsüne kapılın! :city_sunset:
202
+ Halk arasında "dünyanın masalı şehri" olarak bilinen İstanbul, her köşesinde tarih, kültür ve modern yaşamın bir araya geldiği eşsiz bir şehir.
203
+ Yüzyıllardır farklı medeniyetlerin izlerini taşıyan İstanbul, tarihi mekanlarından, müzelerinden, çarşılarından ve restoranlarından oluşan zengin kültürel mirasa sahiptir.
204
+ Boğaz'ın eşsiz manzarasında tekne turu yapmak, Topkapı Sarayı'nı ziyaret etmek, Grand Bazaar'da alışveriş yapmak, Mısır Çarşısı'nın canlı atmosferinde kaybolmak, Galata Kulesi'nden muhteşem bir manzara deneyimlemek veya Beyoğlu'nun hareketli sokaklarında yürüyüş yapmak İstanbul'da unutulmaz anılar yaratmak için fırsatlar sunar.
205
+ İstanbul'un büyülü atmosferini kendiniz yaşamak için hemen planınızı yapın! :flag-tr: #İstanbul #Türkiye #Seyahat #Tarih #Kültür #Gezi
206
+ ```
207
+
208
+ ## 🤝 License and Usage
209
+ This model is provided under Google's Gemma license. Please review and accept the license terms before use.
210
+
211
+ ## 📫 Contact and Support
212
+ For questions, suggestions, and feedback, please open an issue on HuggingFace or contact us directly from our website.
213
+
214
+
215
+ ## Citation
216
+
217
+ ```none
218
+ @article{WiroAI,
219
+ title={gemma-2-9b-it-tr},
220
+ author={Abdullah Bezir, Furkan Burhan Türkay, Cengiz Asmazoğlu},
221
+ year={2024},
222
+ url={https://huggingface.co/WiroAI/gemma-2-9b-it-tr}
223
+ }
224
+ ```
225
+
226
+ ```none
227
+ @article{gemma_2024,
228
+ title={Gemma},
229
+ url={https://www.kaggle.com/m/3301},
230
+ DOI={10.34740/KAGGLE/M/3301},
231
+ publisher={Kaggle},
232
+ author={Gemma Team},
233
+ year={2024}
234
+ }
235
+ ```