theo77186 commited on
Commit
38774af
1 Parent(s): e94fbf3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +412 -0
README.md ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ library_name: transformers
4
+ extra_gated_heading: Access RecurrentGemma on Hugging Face
5
+ extra_gated_prompt: To access RecurrentGemma on Hugging Face, you’re required to review
6
+ and agree to Google’s usage license. To do this, please ensure you’re logged-in
7
+ to Hugging Face and click below. Requests are processed immediately.
8
+ extra_gated_button_content: Acknowledge license
9
+ ---
10
+
11
+ [google/recurrentgemma-9b-it](https://huggingface.co/google/recurrentgemma-9b-it) quantized to 4-bit using bitsandbytes.
12
+
13
+ Quantization settings:
14
+ ```
15
+ BitsAndBytesConfig {
16
+ "_load_in_4bit": true,
17
+ "_load_in_8bit": false,
18
+ "bnb_4bit_compute_dtype": "float16",
19
+ "bnb_4bit_quant_storage": "uint8",
20
+ "bnb_4bit_quant_type": "nf4",
21
+ "bnb_4bit_use_double_quant": false,
22
+ "llm_int8_enable_fp32_cpu_offload": false,
23
+ "llm_int8_has_fp16_weight": false,
24
+ "llm_int8_skip_modules": null,
25
+ "llm_int8_threshold": 6.0,
26
+ "load_in_4bit": true,
27
+ "load_in_8bit": false,
28
+ "quant_method": "bitsandbytes"
29
+ }
30
+ ```
31
+ Original card below.
32
+
33
+ ---
34
+
35
+ # RecurrentGemma Model Card
36
+
37
+ **Model Page**: [RecurrentGemma]( https://ai.google.dev/gemma/docs/recurrentgemma/model_card)
38
+
39
+ This model card corresponds to the 9B instruction version of the RecurrentGemma model. You can also visit the model card of the [9B base model](https://huggingface.co/google/recurrentgemma-9b).
40
+
41
+ **Resources and technical documentation:**
42
+
43
+ * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
44
+ * [RecurrentGemma on Kaggle](https://www.kaggle.com/models/google/recurrentgemma)
45
+
46
+ **Terms of Use:** [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
47
+
48
+ **Authors:** Google
49
+
50
+ ## Model information
51
+
52
+
53
+ ## Usage
54
+
55
+ Below we share some code snippets on how to get quickly started with running the model.
56
+
57
+ First, make sure to `pip install transformers`, then copy the snippet from the section that is relevant for your usecase.
58
+
59
+ ### Running the model on a single / multi GPU
60
+
61
+ ```python
62
+ from transformers import AutoTokenizer, AutoModelForCausalLM
63
+
64
+ tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-9b-it")
65
+ model = AutoModelForCausalLM.from_pretrained("google/recurrentgemma-9b-it", device_map="auto")
66
+
67
+ input_text = "Write me a poem about Machine Learning."
68
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
69
+
70
+ outputs = model.generate(**input_ids)
71
+ print(tokenizer.decode(outputs[0]))
72
+ ```
73
+
74
+ ### Chat Template
75
+
76
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
77
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
78
+
79
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
80
+
81
+ ```py
82
+ from transformers import AutoTokenizer, AutoModelForCausalLM
83
+ import transformers
84
+ import torch
85
+
86
+ tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-9b-it")
87
+ model = AutoModelForCausalLM.from_pretrained(
88
+ "google/recurrentgemma-9b-it",
89
+ device_map="auto"
90
+ torch_dtype=dtype,
91
+ )
92
+ chat = [
93
+ { "role": "user", "content": "Write a hello world program" },
94
+ ]
95
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
96
+ ```
97
+
98
+ At this point, the prompt contains the following text:
99
+
100
+ ```
101
+ <bos><start_of_turn>user
102
+ Write a hello world program<end_of_turn>
103
+ <start_of_turn>model
104
+ ```
105
+
106
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
107
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
108
+ the `<end_of_turn>` token.
109
+
110
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
111
+ chat template.
112
+
113
+ After the prompt is ready, generation can be performed like this:
114
+
115
+ ```py
116
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
117
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
118
+ print(tokenizer.decode(outputs[0]))
119
+ ```
120
+
121
+ ### Model summary
122
+
123
+ #### Description
124
+
125
+ RecurrentGemma is a family of open language models built on a [novel recurrent
126
+ architecture](https://arxiv.org/abs/2402.19427) developed at Google. Both
127
+ pre-trained and instruction-tuned versions are available in English.
128
+
129
+ Like Gemma, RecurrentGemma models are well-suited for a variety of text
130
+ generation tasks, including question answering, summarization, and reasoning.
131
+ Because of its novel architecture, RecurrentGemma requires less memory than
132
+ Gemma and achieves faster inference when generating long sequences.
133
+
134
+ #### Inputs and outputs
135
+
136
+ * **Input:** Text string (e.g., a question, a prompt, or a document to be
137
+ summarized).
138
+ * **Output:** Generated English-language text in response to the input (e.g.,
139
+ an answer to the question, a summary of the document).
140
+
141
+ #### Citation
142
+
143
+ ```none
144
+ @article{recurrentgemma_2024,
145
+ title={RecurrentGemma},
146
+ url={},
147
+ DOI={},
148
+ publisher={Kaggle},
149
+ author={Griffin Team, Soham De, Samuel L Smith, Anushan Fernando, Alex Botev, George-Christian Muraru, Ruba Haroun, Leonard Berrada et al.},
150
+ year={2024}
151
+ }
152
+ ```
153
+
154
+ ### Model data
155
+
156
+ #### Training dataset and data processing
157
+
158
+ RecurrentGemma uses the same training data and data processing as used by the
159
+ Gemma model family. A full description can be found on the [Gemma model
160
+ card](https://ai.google.dev/gemma/docs/model_card#model_data).
161
+
162
+ ## Implementation information
163
+
164
+ ### Hardware and frameworks used during training
165
+
166
+ Like
167
+ [Gemma](https://ai.google.dev/gemma/docs/model_card#implementation_information),
168
+ RecurrentGemma was trained on
169
+ [TPUv5e](https://cloud.google.com/tpu/docs/intro-to-tpu?_gl=1*18wi411*_ga*MzE3NDU5OTY1LjE2MzQwNDA4NDY.*_ga_WH2QY8WWF5*MTcxMTA0MjUxMy4xNy4wLjE3MTEwNDI1MTkuMC4wLjA.&_ga=2.239449409.-317459965.1634040846),
170
+ using [JAX](https://github.com/google/jax) and [ML
171
+ Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/).
172
+
173
+ ## Evaluation information
174
+
175
+ ### Benchmark results
176
+
177
+ #### Evaluation approach
178
+
179
+ These models were evaluated against a large collection of different datasets and
180
+ metrics to cover different aspects of text generation:
181
+
182
+ #### Evaluation results
183
+
184
+ Benchmark | Metric | RecurrentGemma 9B
185
+ ------------------- | ------------- | -----------------
186
+ [MMLU] | 5-shot, top-1 | 60.5
187
+ [HellaSwag] | 0-shot | 80.4
188
+ [PIQA] | 0-shot | 81.3
189
+ [SocialIQA] | 0-shot | 52.3
190
+ [BoolQ] | 0-shot | 80.3
191
+ [WinoGrande] | partial score | 73.6
192
+ [CommonsenseQA] | 7-shot | 73.2
193
+ [OpenBookQA] | | 51.8
194
+ [ARC-e][ARC-c] | | 78.8
195
+ [ARC-c] | | 52.0
196
+ [TriviaQA] | 5-shot | 70.5
197
+ [Natural Questions] | 5-shot | 21.7
198
+ [HumanEval] | pass@1 | 31.1
199
+ [MBPP] | 3-shot | 42.0
200
+ [GSM8K] | maj@1 | 42.6
201
+ [MATH] | 4-shot | 23.8
202
+ [AGIEval] | | 39.3
203
+ [BIG-Bench] | | 55.2
204
+ **Average** | | 56.1
205
+
206
+ ### Inference speed results
207
+
208
+ RecurrentGemma provides improved sampling speeds, particularly for long sequences or large batch sizes. We compared the sampling speeds of RecurrentGemma-9B to Gemma-7B. Note that Gemma-7B uses Multi-Head Attention, and the speed improvements would be smaller when comparing against a transformer using Multi-Query Attention.
209
+
210
+ #### Throughput
211
+
212
+ We evaluated throughput, i.e., the maximum number of tokens produced per second by increasing the batch size, of RecurrentGemma-9B compared to Gemma-7B, using a prefill of 2K tokens.
213
+
214
+ <img src="max_throughput.png" width="400" alt="Maximum Throughput comparison of RecurrentGemma-9B and Gemma-7B">
215
+
216
+ #### Latency
217
+
218
+ We also compared end-to-end speedups achieved by RecurrentGemma-9B over Gemma-7B when sampling a long sequence after a prefill of 4K tokens and using a batch size of 1.
219
+
220
+ \# Tokens Sampled | Gemma-7B (sec) | RecurrentGemma-9B (sec) | Improvement (%)
221
+ ----------------- | -------------- | ----------------------- | ---------------
222
+ 128 | 3.1 | 2.8 | 9.2%
223
+ 256 | 5.9 | 5.4 | 9.7%
224
+ 512 | 11.6 | 10.5 | 10.7%
225
+ 1024 | 23.5 | 20.6 | 14.2%
226
+ 2048 | 48.2 | 40.9 | 17.7%
227
+ 4096 | 101.9 | 81.5 | 25.0%
228
+ 8192 | OOM | 162.8 | -
229
+ 16384 | OOM | 325.2 | -
230
+
231
+ ## Ethics and safety
232
+
233
+ ### Ethics and safety evaluations
234
+
235
+ #### Evaluations approach
236
+
237
+ Our evaluation methods include structured evaluations and internal red-teaming
238
+ testing of relevant content policies. Red-teaming was conducted by a number of
239
+ different teams, each with different goals and human evaluation metrics. These
240
+ models were evaluated against a number of different categories relevant to
241
+ ethics and safety, including:
242
+
243
+ * **Text-to-text content safety:** Human evaluation on prompts covering safety
244
+ policies including child sexual abuse and exploitation, harassment, violence
245
+ and gore, and hate speech.
246
+ * **Text-to-text representational harms:** Benchmark against relevant academic
247
+ datasets such as WinoBias and BBQ Dataset.
248
+ * **Memorization:** Automated evaluation of memorization of training data,
249
+ including the risk of personally identifiable information exposure.
250
+ * **Large-scale harm:** Tests for “dangerous capabilities,” such as chemical,
251
+ biological, radiological, and nuclear (CBRN) risks; as well as tests for
252
+ persuasion and deception, cybersecurity, and autonomous replication.
253
+
254
+ #### Evaluation results
255
+
256
+ The results of ethics and safety evaluations are within acceptable thresholds
257
+ for meeting [internal
258
+ policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11)
259
+ for categories such as child safety, content safety, representational harms,
260
+ memorization, large-scale harms. On top of robust internal evaluations, the
261
+ results of well known safety benchmarks like BBQ, Winogender, Winobias,
262
+ RealToxicity, and TruthfulQA are shown here.
263
+
264
+ Benchmark | Metric | RecurrentGemma 9B | RecurrentGemma 9B IT
265
+ ------------------------ | ------ | ----------------- | --------------------
266
+ [RealToxicity] | avg | 10.3 | 8.8
267
+ [BOLD] | | 39.8 | 47.9
268
+ [CrowS-Pairs] | top-1 | 38.7 | 39.5
269
+ [BBQ Ambig][BBQ] | top-1 | 95.9 | 67.1
270
+ [BBQ Disambig][BBQ] | top-1 | 78.6 | 78.9
271
+ [Winogender] | top-1 | 59.0 | 64.0
272
+ [TruthfulQA] | | 38.6 | 47.7
273
+ [Winobias 1_2][Winobias] | | 61.5 | 60.6
274
+ [Winobias 2_2][Winobias] | | 90.2 | 90.3
275
+ [Toxigen] | | 58.8 | 64.5
276
+
277
+ ## Model usage and limitations
278
+
279
+ ### Known limitations
280
+
281
+ These models have certain limitations that users should be aware of:
282
+
283
+ * **Training data**
284
+ * The quality and diversity of the training data significantly influence
285
+ the model's capabilities. Biases or gaps in the training data can lead
286
+ to limitations in the model's responses.
287
+ * The scope of the training dataset determines the subject areas the model
288
+ can handle effectively.
289
+ * **Context and task complexity**
290
+ * LLMs are better at tasks that can be framed with clear prompts and
291
+ instructions. Open-ended or highly complex tasks might be challenging.
292
+ * A model's performance can be influenced by the amount of context
293
+ provided (longer context generally leads to better outputs, up to a
294
+ certain point).
295
+ * **Language ambiguity and nuance**
296
+ * Natural language is inherently complex. LLMs might struggle to grasp
297
+ subtle nuances, sarcasm, or figurative language.
298
+ * **Factual accuracy**
299
+ * LLMs generate responses based on information they learned from their
300
+ training datasets, but they are not knowledge bases. They may generate
301
+ incorrect or outdated factual statements.
302
+ * **Common sense**
303
+ * LLMs rely on statistical patterns in language. They might lack the
304
+ ability to apply common sense reasoning in certain situations.
305
+
306
+ ### Ethical considerations and risks
307
+
308
+ The development of large language models (LLMs) raises several ethical concerns.
309
+ In creating an open model, we have carefully considered the following:
310
+
311
+ * **Bias and fairness**
312
+ * LLMs trained on large-scale, real-world text data can reflect
313
+ socio-cultural biases embedded in the training material. These models
314
+ underwent careful scrutiny, input data pre-processing described and
315
+ posterior evaluations reported in this card.
316
+ * **Misinformation and misuse**
317
+ * LLMs can be misused to generate text that is false, misleading, or
318
+ harmful.
319
+ * Guidelines are provided for responsible use with the model, see the
320
+ [Responsible Generative AI
321
+ Toolkit](https://ai.google.dev/gemma/responsible).
322
+ * **Transparency and accountability**
323
+ * This model card summarizes details on the models' architecture,
324
+ capabilities, limitations, and evaluation processes.
325
+ * A responsibly developed open model offers the opportunity to share
326
+ innovation by making LLM technology accessible to developers and
327
+ researchers across the AI ecosystem.
328
+
329
+ Risks Identified and Mitigations:
330
+
331
+ * **Perpetuation of biases:** It's encouraged to perform continuous monitoring
332
+ (using evaluation metrics, human review) and the exploration of de-biasing
333
+ techniques during model training, fine-tuning, and other use cases.
334
+ * **Generation of harmful content:** Mechanisms and guidelines for content
335
+ safety are essential. Developers are encouraged to exercise caution and
336
+ implement appropriate content safety safeguards based on their specific
337
+ product policies and application use cases.
338
+ * **Misuse for malicious purposes:** Technical limitations and developer and
339
+ end-user education can help mitigate against malicious applications of LLMs.
340
+ Educational resources and reporting mechanisms for users to flag misuse are
341
+ provided. Prohibited uses of Gemma models are outlined in our [terms of
342
+ use](https://www.kaggle.com/models/google/gemma/license/consent).
343
+ * **Privacy violations:** Models were trained on data filtered for removal of
344
+ PII (Personally Identifiable Information). Developers are encouraged to
345
+ adhere to privacy regulations with privacy-preserving techniques.
346
+
347
+ ## Intended usage
348
+
349
+ ### Application
350
+
351
+ Open Large Language Models (LLMs) have a wide range of applications across
352
+ various industries and domains. The following list of potential uses is not
353
+ comprehensive. The purpose of this list is to provide contextual information
354
+ about the possible use-cases that the model creators considered as part of model
355
+ training and development.
356
+
357
+ * **Content creation and communication**
358
+ * **Text generation:** These models can be used to generate creative text
359
+ formats like poems, scripts, code, marketing copy, email drafts, etc.
360
+ * **Chatbots and conversational AI:** Power conversational interfaces for
361
+ customer service, virtual assistants, or interactive applications.
362
+ * **Text summarization:** Generate concise summaries of a text corpus,
363
+ research papers, or reports.
364
+ * **Research and education**
365
+ * **Natural Language Processing (NLP) research:** These models can serve
366
+ as a foundation for researchers to experiment with NLP techniques,
367
+ develop algorithms, and contribute to the advancement of the field.
368
+ * **Language Learning Tools:** Support interactive language learning
369
+ experiences, aiding in grammar correction or providing writing practice.
370
+ * **Knowledge Exploration:** Assist researchers in exploring large bodies
371
+ of text by generating summaries or answering questions about specific
372
+ topics.
373
+
374
+ ### Benefits
375
+
376
+ At the time of release, this family of models provides high-performance open
377
+ large language model implementations designed from the ground up for Responsible
378
+ AI development compared to similarly sized models.
379
+
380
+ Using the benchmark evaluation metrics described in this document, these models
381
+ have shown to provide superior performance to other, comparably-sized open model
382
+ alternatives.
383
+
384
+ In particular, RecurrentGemma models achieve comparable performance to Gemma
385
+ models but are faster during inference and require less memory, especially on
386
+ long sequences.
387
+
388
+ [MMLU]: https://arxiv.org/abs/2009.03300
389
+ [HellaSwag]: https://arxiv.org/abs/1905.07830
390
+ [PIQA]: https://arxiv.org/abs/1911.11641
391
+ [SocialIQA]: https://arxiv.org/abs/1904.09728
392
+ [BoolQ]: https://arxiv.org/abs/1905.10044
393
+ [winogrande]: https://arxiv.org/abs/1907.10641
394
+ [CommonsenseQA]: https://arxiv.org/abs/1811.00937
395
+ [OpenBookQA]: https://arxiv.org/abs/1809.02789
396
+ [ARC-c]: https://arxiv.org/abs/1911.01547
397
+ [TriviaQA]: https://arxiv.org/abs/1705.03551
398
+ [Natural Questions]: https://github.com/google-research-datasets/natural-questions
399
+ [HumanEval]: https://arxiv.org/abs/2107.03374
400
+ [MBPP]: https://arxiv.org/abs/2108.07732
401
+ [GSM8K]: https://arxiv.org/abs/2110.14168
402
+ [MATH]: https://arxiv.org/abs/2103.03874
403
+ [AGIEval]: https://arxiv.org/abs/2304.06364
404
+ [BIG-Bench]: https://arxiv.org/abs/2206.04615
405
+ [RealToxicity]: https://arxiv.org/abs/2009.11462
406
+ [BOLD]: https://arxiv.org/abs/2101.11718
407
+ [CrowS-Pairs]: https://aclanthology.org/2020.emnlp-main.154/
408
+ [BBQ]: https://arxiv.org/abs/2110.08193v2
409
+ [Winogender]: https://arxiv.org/abs/1804.09301
410
+ [TruthfulQA]: https://arxiv.org/abs/2109.07958
411
+ [winobias]: https://arxiv.org/abs/1804.06876
412
+ [Toxigen]: https://arxiv.org/abs/2203.09509