rahuldshetty commited on
Commit
eb9a94e
1 Parent(s): 0e62e79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -18
README.md CHANGED
@@ -20,13 +20,13 @@ GGUF Quantized version of [gemma-7b-it](https://huggingface.co/google/gemma-7b-i
20
 
21
  **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
22
 
23
- This model card corresponds to the 2B base version of the Gemma model. You can also visit the model card of the [7B base model](https://huggingface.co/google/gemma-7b), [7B instruct model](https://huggingface.co/google/gemma-7b-it), and [2B instruct model](https://huggingface.co/google/gemma-2b-it).
24
 
25
  **Resources and Technical Documentation**:
26
 
27
  * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
28
  * [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
29
- * [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-2b-gg-hf)
30
 
31
  **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
32
 
@@ -52,10 +52,9 @@ state of the art AI models and helping foster innovation for everyone.
52
 
53
  Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
54
 
55
-
56
  #### Fine-tuning the model
57
 
58
- You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `google/gemma-2b`.
59
  In that repository, we provide:
60
 
61
  * A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
@@ -63,15 +62,14 @@ In that repository, we provide:
63
  * A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
64
 
65
 
66
-
67
  #### Running the model on a CPU
68
 
69
 
70
  ```python
71
  from transformers import AutoTokenizer, AutoModelForCausalLM
72
 
73
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
74
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
75
 
76
  input_text = "Write me a poem about Machine Learning."
77
  input_ids = tokenizer(**input_text, return_tensors="pt")
@@ -88,8 +86,8 @@ print(tokenizer.decode(outputs[0]))
88
  # pip install accelerate
89
  from transformers import AutoTokenizer, AutoModelForCausalLM
90
 
91
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
92
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")
93
 
94
  input_text = "Write me a poem about Machine Learning."
95
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -107,8 +105,8 @@ print(tokenizer.decode(outputs[0]))
107
  # pip install accelerate
108
  from transformers import AutoTokenizer, AutoModelForCausalLM
109
 
110
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
111
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)
112
 
113
  input_text = "Write me a poem about Machine Learning."
114
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -123,8 +121,8 @@ print(tokenizer.decode(outputs[0]))
123
  # pip install accelerate
124
  from transformers import AutoTokenizer, AutoModelForCausalLM
125
 
126
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
127
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.bfloat16)
128
 
129
  input_text = "Write me a poem about Machine Learning."
130
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -143,8 +141,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
143
 
144
  quantization_config = BitsAndBytesConfig(load_in_8bit=True)
145
 
146
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
147
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", quantization_config=quantization_config)
148
 
149
  input_text = "Write me a poem about Machine Learning."
150
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -161,8 +159,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
161
 
162
  quantization_config = BitsAndBytesConfig(load_in_4bit=True)
163
 
164
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
165
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", quantization_config=quantization_config)
166
 
167
  input_text = "Write me a poem about Machine Learning."
168
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -186,6 +184,56 @@ model = AutoModelForCausalLM.from_pretrained(
186
  ).to(0)
187
  ```
188
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
  ### Inputs and outputs
190
 
191
  * **Input:** Text string, such as a question, a prompt, or a document to be
@@ -260,7 +308,7 @@ several advantages in this domain:
260
 
261
  ### Software
262
 
263
- Training was done using [JAX](https://github.com/google/jax) and [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ml-pathways).
264
 
265
  JAX allows researchers to take advantage of the latest generation of hardware,
266
  including TPUs, for faster and more efficient training of large models.
 
20
 
21
  **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
22
 
23
+ This model card corresponds to the 7B instruct version of the Gemma model. You can also visit the model card of the [2B base model](https://huggingface.co/google/gemma-2b), [7B base model](https://huggingface.co/google/gemma-7b), and [2B instruct model](https://huggingface.co/google/gemma-2b-it).
24
 
25
  **Resources and Technical Documentation**:
26
 
27
  * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
28
  * [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
29
+ * [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-7b-it-gg-hf)
30
 
31
  **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
32
 
 
52
 
53
  Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
54
 
 
55
  #### Fine-tuning the model
56
 
57
+ You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `google/gemma-7b-it`.
58
  In that repository, we provide:
59
 
60
  * A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
 
62
  * A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
63
 
64
 
 
65
  #### Running the model on a CPU
66
 
67
 
68
  ```python
69
  from transformers import AutoTokenizer, AutoModelForCausalLM
70
 
71
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
72
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it")
73
 
74
  input_text = "Write me a poem about Machine Learning."
75
  input_ids = tokenizer(**input_text, return_tensors="pt")
 
86
  # pip install accelerate
87
  from transformers import AutoTokenizer, AutoModelForCausalLM
88
 
89
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
90
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto")
91
 
92
  input_text = "Write me a poem about Machine Learning."
93
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
105
  # pip install accelerate
106
  from transformers import AutoTokenizer, AutoModelForCausalLM
107
 
108
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
109
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.float16)
110
 
111
  input_text = "Write me a poem about Machine Learning."
112
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
121
  # pip install accelerate
122
  from transformers import AutoTokenizer, AutoModelForCausalLM
123
 
124
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
125
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.bfloat16)
126
 
127
  input_text = "Write me a poem about Machine Learning."
128
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
141
 
142
  quantization_config = BitsAndBytesConfig(load_in_8bit=True)
143
 
144
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
145
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", quantization_config=quantization_config)
146
 
147
  input_text = "Write me a poem about Machine Learning."
148
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
159
 
160
  quantization_config = BitsAndBytesConfig(load_in_4bit=True)
161
 
162
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
163
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", quantization_config=quantization_config)
164
 
165
  input_text = "Write me a poem about Machine Learning."
166
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
184
  ).to(0)
185
  ```
186
 
187
+ ### Chat Template
188
+
189
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
190
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
191
+
192
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
193
+
194
+ ```py
195
+ from transformers import AutoTokenizer, AutoModelForCausalLM
196
+ import transformers
197
+ import torch
198
+
199
+ model_id = "gg-hf/gemma-7b-it"
200
+ dtype = torch.bfloat16
201
+
202
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
203
+ model = AutoModelForCausalLM.from_pretrained(
204
+ model_id,
205
+ device_map="cuda",
206
+ torch_dtype=dtype,
207
+ )
208
+
209
+ chat = [
210
+ { "role": "user", "content": "Write a hello world program" },
211
+ ]
212
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
213
+ ```
214
+
215
+ At this point, the prompt contains the following text:
216
+
217
+ ```
218
+ <start_of_turn>user
219
+ Write a hello world program<end_of_turn>
220
+ <start_of_turn>model
221
+ ```
222
+
223
+ As you can see, each turn is preceeded by a `<start_of_turn>` delimiter and then the role of the entity
224
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
225
+ the `<end_of_turn>` token.
226
+
227
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
228
+ chat template.
229
+
230
+ After the prompt is ready, generation can be performed like this:
231
+
232
+ ```py
233
+ inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
234
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
235
+ ```
236
+
237
  ### Inputs and outputs
238
 
239
  * **Input:** Text string, such as a question, a prompt, or a document to be
 
308
 
309
  ### Software
310
 
311
+ Training was done using [JAX](https://github.com/google/jax) and [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture).
312
 
313
  JAX allows researchers to take advantage of the latest generation of hardware,
314
  including TPUs, for faster and more efficient training of large models.