RichardErkhov commited on
Commit
a5d2cf7
1 Parent(s): edbd328

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +320 -0
README.md ADDED
@@ -0,0 +1,320 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ gemma-mling-7b - GGUF
11
+ - Model creator: https://huggingface.co/beomi/
12
+ - Original model: https://huggingface.co/beomi/gemma-mling-7b/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [gemma-mling-7b.Q2_K.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q2_K.gguf) | Q2_K | 3.24GB |
18
+ | [gemma-mling-7b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.IQ3_XS.gguf) | IQ3_XS | 3.54GB |
19
+ | [gemma-mling-7b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.IQ3_S.gguf) | IQ3_S | 3.71GB |
20
+ | [gemma-mling-7b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q3_K_S.gguf) | Q3_K_S | 3.71GB |
21
+ | [gemma-mling-7b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.IQ3_M.gguf) | IQ3_M | 3.82GB |
22
+ | [gemma-mling-7b.Q3_K.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q3_K.gguf) | Q3_K | 4.07GB |
23
+ | [gemma-mling-7b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q3_K_M.gguf) | Q3_K_M | 4.07GB |
24
+ | [gemma-mling-7b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q3_K_L.gguf) | Q3_K_L | 4.39GB |
25
+ | [gemma-mling-7b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.IQ4_XS.gguf) | IQ4_XS | 4.48GB |
26
+ | [gemma-mling-7b.Q4_0.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q4_0.gguf) | Q4_0 | 4.67GB |
27
+ | [gemma-mling-7b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.IQ4_NL.gguf) | IQ4_NL | 4.69GB |
28
+ | [gemma-mling-7b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q4_K_S.gguf) | Q4_K_S | 4.7GB |
29
+ | [gemma-mling-7b.Q4_K.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q4_K.gguf) | Q4_K | 4.96GB |
30
+ | [gemma-mling-7b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q4_K_M.gguf) | Q4_K_M | 4.96GB |
31
+ | [gemma-mling-7b.Q4_1.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q4_1.gguf) | Q4_1 | 5.12GB |
32
+ | [gemma-mling-7b.Q5_0.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q5_0.gguf) | Q5_0 | 5.57GB |
33
+ | [gemma-mling-7b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q5_K_S.gguf) | Q5_K_S | 5.57GB |
34
+ | [gemma-mling-7b.Q5_K.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q5_K.gguf) | Q5_K | 5.72GB |
35
+ | [gemma-mling-7b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q5_K_M.gguf) | Q5_K_M | 5.72GB |
36
+ | [gemma-mling-7b.Q5_1.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q5_1.gguf) | Q5_1 | 6.02GB |
37
+ | [gemma-mling-7b.Q6_K.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q6_K.gguf) | Q6_K | 6.53GB |
38
+ | [gemma-mling-7b.Q8_0.gguf](https://huggingface.co/RichardErkhov/beomi_-_gemma-mling-7b-gguf/blob/main/gemma-mling-7b.Q8_0.gguf) | Q8_0 | 8.45GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ language:
46
+ - ko
47
+ - en
48
+ - zh
49
+ - ja
50
+ license: other
51
+ library_name: transformers
52
+ license_name: gemma-terms-of-use
53
+ license_link: https://ai.google.dev/gemma/terms
54
+ pipeline_tag: text-generation
55
+ tags:
56
+ - pytorch
57
+ ---
58
+
59
+ # Gemma-Mling: Multilingual Gemma
60
+
61
+ > Update @ 2024.04.15: First release of Gemma-Mling 7B model
62
+
63
+ **Original Gemma Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
64
+
65
+ This model card corresponds to the 7B base version of the **Gemma-Mling** model,
66
+ continual pretrained on mainly Korean/English/Chinese/Japanese + 500 multilingual corpus.
67
+
68
+ **Resources and Technical Documentation**:
69
+
70
+ * [Original Google's Gemma-7B](https://huggingface.co/google/gemma-7b)
71
+ * [Training Code @ Github: Gemma-EasyLM](https://github.com/Beomi/Gemma-EasyLM)
72
+
73
+ **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
74
+
75
+ **Citation**
76
+
77
+ ```bibtex
78
+ @misc {gemma_mling_7b,
79
+ author = { {Junbum Lee, Taekyoon Choi} },
80
+ title = { gemma-mling-7b },
81
+ year = 2024,
82
+ url = { https://huggingface.co/beomi/gemma-mling-7b },
83
+ publisher = { Hugging Face }
84
+ }
85
+ ```
86
+
87
+ **Model Developers**: Junbum Lee (Beomi) & Taekyoon Choi (Taekyoon)
88
+
89
+ ## Model Information
90
+
91
+ ### Usage
92
+
93
+ Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
94
+
95
+ #### Running the model on a CPU
96
+
97
+ ```python
98
+ from transformers import AutoTokenizer, AutoModelForCausalLM
99
+
100
+ tokenizer = AutoTokenizer.from_pretrained("beomi/gemma-mling-7b")
101
+ model = AutoModelForCausalLM.from_pretrained("beomi/gemma-mling-7b")
102
+
103
+ input_text = "머신러닝과 딥러닝의 차이는"
104
+ input_ids = tokenizer(input_text, return_tensors="pt")
105
+
106
+ outputs = model.generate(**input_ids)
107
+ print(tokenizer.decode(outputs[0]))
108
+ ```
109
+
110
+
111
+ #### Running the model on a single / multi GPU
112
+
113
+ ```python
114
+ # pip install accelerate
115
+ from transformers import AutoTokenizer, AutoModelForCausalLM
116
+
117
+ tokenizer = AutoTokenizer.from_pretrained("beomi/gemma-mling-7b")
118
+ model = AutoModelForCausalLM.from_pretrained("beomi/gemma-mling-7b", device_map="auto")
119
+
120
+ input_text = "머신러닝과 딥러닝의 차이는"
121
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
122
+
123
+ outputs = model.generate(**input_ids)
124
+ print(tokenizer.decode(outputs[0]))
125
+ ```
126
+
127
+ ### Inputs and outputs
128
+
129
+ * **Input:** Text string, such as a question, a prompt, or a document to be
130
+ summarized.
131
+ * **Output:** Generated Multilingual-language text in response to the input, such
132
+ as an answer to a question, or a summary of a document.
133
+
134
+ ## Implementation Information
135
+
136
+ Details about the model internals.
137
+
138
+ ### Software
139
+
140
+ Training was done using [beomi/Gemma-EasyLM](https://github.com/Beomi/Gemma-EasyLM).
141
+
142
+ ### Dataset
143
+
144
+ We trained a mixture of multiple language datasets and trained until 100B.
145
+ The released model is the best performance model based on our Evaluation below from model checkpoints.
146
+
147
+ For Korean and English datasets, we utilized sampled llama2ko training dataset which combined 1:1 ratio in each language.
148
+
149
+ | Dataset | Jsonl (GB) | Sampled |
150
+ |--------------------------|------------|---------|
151
+ | range3/cc100-ja | 96.39 | No |
152
+ | Skywork/SkyPile-150B | 100.57 | Yes |
153
+ | llama2ko dataset (ko/en) | 108.5 | Yes |
154
+ | cis-lmu/Glot500 | 181.24 | No |
155
+ | Total | 486.7 | . |
156
+
157
+ ## Training Progress
158
+
159
+ - Report Link: https://api.wandb.ai/links/tgchoi/6lt0ce3s
160
+
161
+ ## Evaluation
162
+
163
+ Model evaluation metrics and results.
164
+
165
+ ### Evaluation Scripts
166
+
167
+ - For Knowledge / KoBest / XCOPA / XWinograd
168
+ - [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) v0.4.2
169
+ ```bash
170
+ !git clone https://github.com/EleutherAI/lm-evaluation-harness.git
171
+ !cd lm-evaluation-harness && pip install -r requirements.txt && pip install -e .
172
+
173
+ !lm_eval --model hf \
174
+ --model_args pretrained=beomi/gemma-mling-7b,dtype="float16" \
175
+ --tasks "haerae,kobest,kmmlu_direct,cmmlu,ceval-valid,mmlu,xwinograd,xcopa \
176
+ --num_fewshot "0,5,5,5,5,5,0,5" \
177
+ --device cuda
178
+ ```
179
+ - For JP Eval Harness
180
+ - [Stability-AI/lm-evaluation-harness (`jp-stable` branch)](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable)
181
+ ```bash
182
+ !git clone -b jp-stable https://github.com/Stability-AI/lm-evaluation-harness.git
183
+ !cd lm-evaluation-harness && pip install -e ".[ja]"
184
+ !pip install 'fugashi[unidic]' && python -m unidic download
185
+
186
+ !cd lm-evaluation-harness && python main.py \
187
+ --model hf-causal \
188
+ --model_args pretrained=beomi/gemma-mling-7b,torch_dtype='auto'"
189
+ --tasks "jcommonsenseqa-1.1-0.3,jnli-1.3-0.3,marc_ja-1.1-0.3,jsquad-1.1-0.3,jaqket_v2-0.2-0.3,xlsum_ja,mgsm"
190
+ --num_fewshot "3,3,3,2,1,1,5"
191
+ ```
192
+
193
+ ### Benchmark Results
194
+
195
+ | Category | Metric | Shots | Score |
196
+ |----------------------------------|----------------------|------------|--------|
197
+ | **Default Metric** | **ACC** | | |
198
+ | **Knowledge (5-shot)** | MMLU | | 61.76 |
199
+ | | KMMLU (Exact Match) | | 42.75 |
200
+ | | CMLU | | 50.93 |
201
+ | | JMLU | | |
202
+ | | C-EVAL | | 50.07 |
203
+ | | HAERAE | 0-shot | 63.89 |
204
+ | **KoBest (5-shot)** | BoolQ | | 85.47 |
205
+ | | COPA | | 83.5 |
206
+ | | Hellaswag (acc-norm) | | 63.2 |
207
+ | | Sentineg | | 97.98 |
208
+ | | WiC | | 70.95 |
209
+ | **XCOPA (5-shot)** | IT | | 72.8 |
210
+ | | ID | | 76.4 |
211
+ | | TH | | 60.2 |
212
+ | | TR | | 65.6 |
213
+ | | VI | | 77.2 |
214
+ | | ZH | | 80.2 |
215
+ | **JP Eval Harness (Prompt ver 0.3)** | JcommonsenseQA | 3-shot | 85.97 |
216
+ | | JNLI | 3-shot | 39.11 |
217
+ | | Marc_ja | 3-shot | 96.48 |
218
+ | | JSquad (Exact Match) | 2-shot | 70.69 |
219
+ | | Jaqket (Exact Match) | 1-shot | 81.53 |
220
+ | | MGSM | 5-shot | 28.8 |
221
+ | **XWinograd (0-shot)** | EN | | 89.03 |
222
+ | | FR | | 72.29 |
223
+ | | JP | | 82.69 |
224
+ | | PT | | 73.38 |
225
+ | | RU | | 68.57 |
226
+ | | ZH | | 79.17 |
227
+
228
+
229
+
230
+ ## Usage and Limitations
231
+
232
+ These models have certain limitations that users should be aware of.
233
+
234
+ ### Intended Usage
235
+
236
+ Open Large Language Models (LLMs) have a wide range of applications across
237
+ various industries and domains. The following list of potential uses is not
238
+ comprehensive. The purpose of this list is to provide contextual information
239
+ about the possible use-cases that the model creators considered as part of model
240
+ training and development.
241
+
242
+ * Content Creation and Communication
243
+ * Text Generation: These models can be used to generate creative text formats
244
+ such as poems, scripts, code, marketing copy, and email drafts.
245
+ * Research and Education
246
+ * Natural Language Processing (NLP) Research: These models can serve as a
247
+ foundation for researchers to experiment with NLP techniques, develop
248
+ algorithms, and contribute to the advancement of the field.
249
+ * Language Learning Tools: Support interactive language learning experiences,
250
+ aiding in grammar correction or providing writing practice.
251
+ * Knowledge Exploration: Assist researchers in exploring large bodies of text
252
+ by generating summaries or answering questions about specific topics.
253
+
254
+ ### Limitations
255
+
256
+ * Training Data
257
+ * The quality and diversity of the training data significantly influence the
258
+ model's capabilities. Biases or gaps in the training data can lead to
259
+ limitations in the model's responses.
260
+ * The scope of the training dataset determines the subject areas the model can
261
+ handle effectively.
262
+ * Context and Task Complexity
263
+ * LLMs are better at tasks that can be framed with clear prompts and
264
+ instructions. Open-ended or highly complex tasks might be challenging.
265
+ * A model's performance can be influenced by the amount of context provided
266
+ (longer context generally leads to better outputs, up to a certain point).
267
+ * Language Ambiguity and Nuance
268
+ * Natural language is inherently complex. LLMs might struggle to grasp subtle
269
+ nuances, sarcasm, or figurative language.
270
+ * Factual Accuracy
271
+ * LLMs generate responses based on information they learned from their
272
+ training datasets, but they are not knowledge bases. They may generate
273
+ incorrect or outdated factual statements.
274
+ * Common Sense
275
+ * LLMs rely on statistical patterns in language. They might lack the ability
276
+ to apply common sense reasoning in certain situations.
277
+
278
+ ### Ethical Considerations and Risks
279
+
280
+ The development of large language models (LLMs) raises several ethical concerns.
281
+ In creating an open model, we have carefully considered the following:
282
+
283
+ * Bias and Fairness
284
+ * LLMs trained on large-scale, real-world text data can reflect socio-cultural
285
+ biases embedded in the training material. These models underwent careful
286
+ scrutiny, input data pre-processing described and posterior evaluations
287
+ reported in this card.
288
+ * Misinformation and Misuse
289
+ * LLMs can be misused to generate text that is false, misleading, or harmful.
290
+ * Guidelines are provided for responsible use with the model, see the
291
+ [Responsible Generative AI Toolkit](http://ai.google.dev/gemma/responsible).
292
+ * Transparency and Accountability:
293
+ * This model card summarizes details on the models' architecture,
294
+ capabilities, limitations, and evaluation processes.
295
+ * A responsibly developed open model offers the opportunity to share
296
+ innovation by making LLM technology accessible to developers and researchers
297
+ across the AI ecosystem.
298
+
299
+ Risks identified and mitigations:
300
+
301
+ * Perpetuation of biases: It's encouraged to perform continuous monitoring
302
+ (using evaluation metrics, human review) and the exploration of de-biasing
303
+ techniques during model training, fine-tuning, and other use cases.
304
+ * Generation of harmful content: Mechanisms and guidelines for content safety
305
+ are essential. Developers are encouraged to exercise caution and implement
306
+ appropriate content safety safeguards based on their specific product policies
307
+ and application use cases.
308
+ * Misuse for malicious purposes: Technical limitations and developer and
309
+ end-user education can help mitigate against malicious applications of LLMs.
310
+ Educational resources and reporting mechanisms for users to flag misuse are
311
+ provided. Prohibited uses of Gemma models are outlined in the
312
+ [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
313
+ * Privacy violations: Models were trained on data filtered for removal of PII
314
+ (Personally Identifiable Information). Developers are encouraged to adhere to
315
+ privacy regulations with privacy-preserving techniques.
316
+
317
+ ## Acknowledgement
318
+
319
+ The training is supported by [TPU Research Cloud](https://sites.research.google/trc/) program.
320
+