RichardErkhov commited on
Commit
5e0b83e
1 Parent(s): 59abc12

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +332 -0
README.md ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ recurrentgemma-2b - bnb 4bits
11
+ - Model creator: https://huggingface.co/google/
12
+ - Original model: https://huggingface.co/google/recurrentgemma-2b/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: gemma
20
+ library_name: transformers
21
+ extra_gated_heading: Access RecurrentGemma on Hugging Face
22
+ extra_gated_prompt: To access RecurrentGemma on Hugging Face, you’re required to review
23
+ and agree to Google’s usage license. To do this, please ensure you’re logged-in
24
+ to Hugging Face and click below. Requests are processed immediately.
25
+ extra_gated_button_content: Acknowledge license
26
+ ---
27
+
28
+ # RecurrentGemma Model Card
29
+
30
+ **Model Page**: [RecurrentGemma]( https://ai.google.dev/gemma/docs/recurrentgemma/model_card)
31
+
32
+ This model card corresponds to the 2B base version of the RecurrentGemma model. You can also visit the model card of the [2B instruct model](https://huggingface.co/google/recurrentgemma-2b-it).
33
+
34
+ **Resources and technical documentation:**
35
+
36
+ * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
37
+ * [RecurrentGemma on Kaggle](https://www.kaggle.com/models/google/recurrentgemma)
38
+
39
+ **Terms of Use:** [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
40
+
41
+ **Authors:** Google
42
+
43
+ ## Usage
44
+
45
+ Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install --upgrade git+https://github.com/huggingface/transformers.git, then copy the snippet from the section that is relevant for your usecase.
46
+
47
+ ### Running the model on a single / multi GPU
48
+
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModelForCausalLM
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-2b")
53
+ model = AutoModelForCausalLM.from_pretrained("google/recurrentgemma-2b", device_map="auto")
54
+
55
+ input_text = "Write me a poem about Machine Learning."
56
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
57
+
58
+ outputs = model.generate(**input_ids)
59
+ print(tokenizer.decode(outputs[0]))
60
+ ```
61
+
62
+ ## Model information
63
+
64
+ ### Model summary
65
+
66
+ #### Description
67
+
68
+ RecurrentGemma is a family of open language models built on a [novel recurrent
69
+ architecture](https://arxiv.org/abs/2402.19427) developed at Google. Both
70
+ pre-trained and instruction-tuned versions are available in English.
71
+
72
+ Like Gemma, RecurrentGemma models are well-suited for a variety of text
73
+ generation tasks, including question answering, summarization, and reasoning.
74
+ Because of its novel architecture, RecurrentGemma requires less memory than
75
+ Gemma and achieves faster inference when generating long sequences.
76
+
77
+ #### Inputs and outputs
78
+
79
+ * **Input:** Text string (e.g., a question, a prompt, or a document to be
80
+ summarized).
81
+ * **Output:** Generated English-language text in response to the input (e.g.,
82
+ an answer to the question, a summary of the document).
83
+
84
+ #### Citation
85
+
86
+ ```none
87
+ @article{recurrentgemma_2024,
88
+ title={RecurrentGemma},
89
+ url={},
90
+ DOI={},
91
+ publisher={Kaggle},
92
+ author={Griffin Team, Alexsandar Botev and Soham De and Samuel L Smith and Anushan Fernando and George-Christian Muraru and Ruba Haroun and Leonard Berrada et al.},
93
+ year={2024}
94
+ }
95
+ ```
96
+
97
+ ### Model data
98
+
99
+ #### Training dataset and data processing
100
+
101
+ RecurrentGemma uses the same training data and data processing as used by the
102
+ Gemma model family. A full description can be found on the [Gemma model
103
+ card](https://ai.google.dev/gemma/docs/model_card#model_data).
104
+
105
+ ## Implementation information
106
+
107
+ ### Hardware and frameworks used during training
108
+
109
+ Like
110
+ [Gemma](https://ai.google.dev/gemma/docs/model_card#implementation_information),
111
+ RecurrentGemma was trained on
112
+ [TPUv5e](https://cloud.google.com/tpu/docs/intro-to-tpu?_gl=1*18wi411*_ga*MzE3NDU5OTY1LjE2MzQwNDA4NDY.*_ga_WH2QY8WWF5*MTcxMTA0MjUxMy4xNy4wLjE3MTEwNDI1MTkuMC4wLjA.&_ga=2.239449409.-317459965.1634040846),
113
+ using [JAX](https://github.com/google/jax) and [ML
114
+ Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/).
115
+
116
+ ## Evaluation information
117
+
118
+ ### Benchmark results
119
+
120
+ #### Evaluation approach
121
+
122
+ These models were evaluated against a large collection of different datasets and
123
+ metrics to cover different aspects of text generation:
124
+
125
+ #### Evaluation results
126
+
127
+ Benchmark | Metric | RecurrentGemma 2B
128
+ ------------------- | ------------- | -----------------
129
+ [MMLU] | 5-shot, top-1 | 38.4
130
+ [HellaSwag] | 0-shot | 71.0
131
+ [PIQA] | 0-shot | 78.5
132
+ [SocialIQA] | 0-shot | 51.8
133
+ [BoolQ] | 0-shot | 71.3
134
+ [WinoGrande] | partial score | 67.8
135
+ [CommonsenseQA] | 7-shot | 63.7
136
+ [OpenBookQA] | | 47.2
137
+ [ARC-e][ARC-c] | | 72.9
138
+ [ARC-c] | | 42.3
139
+ [TriviaQA] | 5-shot | 52.5
140
+ [Natural Questions] | 5-shot | 11.5
141
+ [HumanEval] | pass@1 | 21.3
142
+ [MBPP] | 3-shot | 28.8
143
+ [GSM8K] | maj@1 | 13.4
144
+ [MATH] | 4-shot | 11.0
145
+ [AGIEval] | | 23.8
146
+ [BIG-Bench] | | 35.3
147
+ **Average** | | 44.6
148
+
149
+ ## Ethics and safety
150
+
151
+ ### Ethics and safety evaluations
152
+
153
+ #### Evaluations approach
154
+
155
+ Our evaluation methods include structured evaluations and internal red-teaming
156
+ testing of relevant content policies. Red-teaming was conducted by a number of
157
+ different teams, each with different goals and human evaluation metrics. These
158
+ models were evaluated against a number of different categories relevant to
159
+ ethics and safety, including:
160
+
161
+ * **Text-to-text content safety:** Human evaluation on prompts covering safety
162
+ policies including child sexual abuse and exploitation, harassment, violence
163
+ and gore, and hate speech.
164
+ * **Text-to-text representational harms:** Benchmark against relevant academic
165
+ datasets such as WinoBias and BBQ Dataset.
166
+ * **Memorization:** Automated evaluation of memorization of training data,
167
+ including the risk of personally identifiable information exposure.
168
+ * **Large-scale harm:** Tests for “dangerous capabilities,” such as chemical,
169
+ biological, radiological, and nuclear (CBRN) risks; as well as tests for
170
+ persuasion and deception, cybersecurity, and autonomous replication.
171
+
172
+ #### Evaluation results
173
+
174
+ The results of ethics and safety evaluations are within acceptable thresholds
175
+ for meeting [internal
176
+ policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11)
177
+ for categories such as child safety, content safety, representational harms,
178
+ memorization, large-scale harms. On top of robust internal evaluations, the
179
+ results of well known safety benchmarks like BBQ, Winogender, Winobias,
180
+ RealToxicity, and TruthfulQA are shown here.
181
+
182
+ Benchmark | Metric | RecurrentGemma 2B | RecurrentGemma 2B IT
183
+ ------------------------ | ------ | ----------------- | --------------------
184
+ [RealToxicity] | avg | 9.8 | 7.6
185
+ [BOLD] | | 39.3 | 52.4
186
+ [CrowS-Pairs] | top-1 | 41.1 | 43.4
187
+ [BBQ Ambig][BBQ] | top-1 | 62.6 | 71.1
188
+ [BBQ Disambig][BBQ] | top-1 | 58.4 | 50.8
189
+ [Winogender] | top-1 | 55.1 | 54.7
190
+ [TruthfulQA] | | 35.1 | 42.7
191
+ [Winobias 1_2][Winobias] | | 58.4 | 56.4
192
+ [Winobias 2_2][Winobias] | | 90.0 | 75.4
193
+ [Toxigen] | | 56.7 | 50.0
194
+
195
+ ## Model usage and limitations
196
+
197
+ ### Known limitations
198
+
199
+ These models have certain limitations that users should be aware of:
200
+
201
+ * **Training data**
202
+ * The quality and diversity of the training data significantly influence
203
+ the model's capabilities. Biases or gaps in the training data can lead
204
+ to limitations in the model's responses.
205
+ * The scope of the training dataset determines the subject areas the model
206
+ can handle effectively.
207
+ * **Context and task complexity**
208
+ * LLMs are better at tasks that can be framed with clear prompts and
209
+ instructions. Open-ended or highly complex tasks might be challenging.
210
+ * A model's performance can be influenced by the amount of context
211
+ provided (longer context generally leads to better outputs, up to a
212
+ certain point).
213
+ * **Language ambiguity and nuance**
214
+ * Natural language is inherently complex. LLMs might struggle to grasp
215
+ subtle nuances, sarcasm, or figurative language.
216
+ * **Factual accuracy**
217
+ * LLMs generate responses based on information they learned from their
218
+ training datasets, but they are not knowledge bases. They may generate
219
+ incorrect or outdated factual statements.
220
+ * **Common sense**
221
+ * LLMs rely on statistical patterns in language. They might lack the
222
+ ability to apply common sense reasoning in certain situations.
223
+
224
+ ### Ethical considerations and risks
225
+
226
+ The development of large language models (LLMs) raises several ethical concerns.
227
+ In creating an open model, we have carefully considered the following:
228
+
229
+ * **Bias and fairness**
230
+ * LLMs trained on large-scale, real-world text data can reflect
231
+ socio-cultural biases embedded in the training material. These models
232
+ underwent careful scrutiny, input data pre-processing described and
233
+ posterior evaluations reported in this card.
234
+ * **Misinformation and misuse**
235
+ * LLMs can be misused to generate text that is false, misleading, or
236
+ harmful.
237
+ * Guidelines are provided for responsible use with the model, see the
238
+ [Responsible Generative AI
239
+ Toolkit](https://ai.google.dev/gemma/responsible).
240
+ * **Transparency and accountability**
241
+ * This model card summarizes details on the models' architecture,
242
+ capabilities, limitations, and evaluation processes.
243
+ * A responsibly developed open model offers the opportunity to share
244
+ innovation by making LLM technology accessible to developers and
245
+ researchers across the AI ecosystem.
246
+
247
+ Risks Identified and Mitigations:
248
+
249
+ * **Perpetuation of biases:** It's encouraged to perform continuous monitoring
250
+ (using evaluation metrics, human review) and the exploration of de-biasing
251
+ techniques during model training, fine-tuning, and other use cases.
252
+ * **Generation of harmful content:** Mechanisms and guidelines for content
253
+ safety are essential. Developers are encouraged to exercise caution and
254
+ implement appropriate content safety safeguards based on their specific
255
+ product policies and application use cases.
256
+ * **Misuse for malicious purposes:** Technical limitations and developer and
257
+ end-user education can help mitigate against malicious applications of LLMs.
258
+ Educational resources and reporting mechanisms for users to flag misuse are
259
+ provided. Prohibited uses of Gemma models are outlined in our [terms of
260
+ use](https://www.kaggle.com/models/google/gemma/license/consent).
261
+ * **Privacy violations:** Models were trained on data filtered for removal of
262
+ PII (Personally Identifiable Information). Developers are encouraged to
263
+ adhere to privacy regulations with privacy-preserving techniques.
264
+
265
+ ## Intended usage
266
+
267
+ ### Application
268
+
269
+ Open Large Language Models (LLMs) have a wide range of applications across
270
+ various industries and domains. The following list of potential uses is not
271
+ comprehensive. The purpose of this list is to provide contextual information
272
+ about the possible use-cases that the model creators considered as part of model
273
+ training and development.
274
+
275
+ * **Content creation and communication**
276
+ * **Text generation:** These models can be used to generate creative text
277
+ formats like poems, scripts, code, marketing copy, email drafts, etc.
278
+ * **Chatbots and conversational AI:** Power conversational interfaces for
279
+ customer service, virtual assistants, or interactive applications.
280
+ * **Text summarization:** Generate concise summaries of a text corpus,
281
+ research papers, or reports.
282
+ * **Research and education**
283
+ * **Natural Language Processing (NLP) research:** These models can serve
284
+ as a foundation for researchers to experiment with NLP techniques,
285
+ develop algorithms, and contribute to the advancement of the field.
286
+ * **Language Learning Tools:** Support interactive language learning
287
+ experiences, aiding in grammar correction or providing writing practice.
288
+ * **Knowledge Exploration:** Assist researchers in exploring large bodies
289
+ of text by generating summaries or answering questions about specific
290
+ topics.
291
+
292
+ ### Benefits
293
+
294
+ At the time of release, this family of models provides high-performance open
295
+ large language model implementations designed from the ground up for Responsible
296
+ AI development compared to similarly sized models.
297
+
298
+ Using the benchmark evaluation metrics described in this document, these models
299
+ have shown to provide superior performance to other, comparably-sized open model
300
+ alternatives.
301
+
302
+ In particular, RecurrentGemma models achieve comparable performance to Gemma
303
+ models but are faster during inference and require less memory, especially on
304
+ long sequences.
305
+
306
+ [MMLU]: https://arxiv.org/abs/2009.03300
307
+ [HellaSwag]: https://arxiv.org/abs/1905.07830
308
+ [PIQA]: https://arxiv.org/abs/1911.11641
309
+ [SocialIQA]: https://arxiv.org/abs/1904.09728
310
+ [BoolQ]: https://arxiv.org/abs/1905.10044
311
+ [winogrande]: https://arxiv.org/abs/1907.10641
312
+ [CommonsenseQA]: https://arxiv.org/abs/1811.00937
313
+ [OpenBookQA]: https://arxiv.org/abs/1809.02789
314
+ [ARC-c]: https://arxiv.org/abs/1911.01547
315
+ [TriviaQA]: https://arxiv.org/abs/1705.03551
316
+ [Natural Questions]: https://github.com/google-research-datasets/natural-questions
317
+ [HumanEval]: https://arxiv.org/abs/2107.03374
318
+ [MBPP]: https://arxiv.org/abs/2108.07732
319
+ [GSM8K]: https://arxiv.org/abs/2110.14168
320
+ [MATH]: https://arxiv.org/abs/2103.03874
321
+ [AGIEval]: https://arxiv.org/abs/2304.06364
322
+ [BIG-Bench]: https://arxiv.org/abs/2206.04615
323
+ [RealToxicity]: https://arxiv.org/abs/2009.11462
324
+ [BOLD]: https://arxiv.org/abs/2101.11718
325
+ [CrowS-Pairs]: https://aclanthology.org/2020.emnlp-main.154/
326
+ [BBQ]: https://arxiv.org/abs/2110.08193v2
327
+ [Winogender]: https://arxiv.org/abs/1804.09301
328
+ [TruthfulQA]: https://arxiv.org/abs/2109.07958
329
+ [winobias]: https://arxiv.org/abs/1804.06876
330
+ [Toxigen]: https://arxiv.org/abs/2203.09509
331
+
332
+