rahuldshetty
commited on
Commit
•
eb9a94e
1
Parent(s):
0e62e79
Update README.md
Browse files
README.md
CHANGED
@@ -20,13 +20,13 @@ GGUF Quantized version of [gemma-7b-it](https://huggingface.co/google/gemma-7b-i
|
|
20 |
|
21 |
**Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
|
22 |
|
23 |
-
This model card corresponds to the
|
24 |
|
25 |
**Resources and Technical Documentation**:
|
26 |
|
27 |
* [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
|
28 |
* [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
|
29 |
-
* [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-
|
30 |
|
31 |
**Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
|
32 |
|
@@ -52,10 +52,9 @@ state of the art AI models and helping foster innovation for everyone.
|
|
52 |
|
53 |
Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
|
54 |
|
55 |
-
|
56 |
#### Fine-tuning the model
|
57 |
|
58 |
-
You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `google/gemma-
|
59 |
In that repository, we provide:
|
60 |
|
61 |
* A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
|
@@ -63,15 +62,14 @@ In that repository, we provide:
|
|
63 |
* A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
|
64 |
|
65 |
|
66 |
-
|
67 |
#### Running the model on a CPU
|
68 |
|
69 |
|
70 |
```python
|
71 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
72 |
|
73 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-
|
74 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-
|
75 |
|
76 |
input_text = "Write me a poem about Machine Learning."
|
77 |
input_ids = tokenizer(**input_text, return_tensors="pt")
|
@@ -88,8 +86,8 @@ print(tokenizer.decode(outputs[0]))
|
|
88 |
# pip install accelerate
|
89 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
90 |
|
91 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-
|
92 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-
|
93 |
|
94 |
input_text = "Write me a poem about Machine Learning."
|
95 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
@@ -107,8 +105,8 @@ print(tokenizer.decode(outputs[0]))
|
|
107 |
# pip install accelerate
|
108 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
109 |
|
110 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-
|
111 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-
|
112 |
|
113 |
input_text = "Write me a poem about Machine Learning."
|
114 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
@@ -123,8 +121,8 @@ print(tokenizer.decode(outputs[0]))
|
|
123 |
# pip install accelerate
|
124 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
125 |
|
126 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-
|
127 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-
|
128 |
|
129 |
input_text = "Write me a poem about Machine Learning."
|
130 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
@@ -143,8 +141,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
|
|
143 |
|
144 |
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
|
145 |
|
146 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-
|
147 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-
|
148 |
|
149 |
input_text = "Write me a poem about Machine Learning."
|
150 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
@@ -161,8 +159,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
|
|
161 |
|
162 |
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
|
163 |
|
164 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-
|
165 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-
|
166 |
|
167 |
input_text = "Write me a poem about Machine Learning."
|
168 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
@@ -186,6 +184,56 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
186 |
).to(0)
|
187 |
```
|
188 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
189 |
### Inputs and outputs
|
190 |
|
191 |
* **Input:** Text string, such as a question, a prompt, or a document to be
|
@@ -260,7 +308,7 @@ several advantages in this domain:
|
|
260 |
|
261 |
### Software
|
262 |
|
263 |
-
Training was done using [JAX](https://github.com/google/jax) and [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture
|
264 |
|
265 |
JAX allows researchers to take advantage of the latest generation of hardware,
|
266 |
including TPUs, for faster and more efficient training of large models.
|
|
|
20 |
|
21 |
**Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
|
22 |
|
23 |
+
This model card corresponds to the 7B instruct version of the Gemma model. You can also visit the model card of the [2B base model](https://huggingface.co/google/gemma-2b), [7B base model](https://huggingface.co/google/gemma-7b), and [2B instruct model](https://huggingface.co/google/gemma-2b-it).
|
24 |
|
25 |
**Resources and Technical Documentation**:
|
26 |
|
27 |
* [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
|
28 |
* [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
|
29 |
+
* [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-7b-it-gg-hf)
|
30 |
|
31 |
**Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
|
32 |
|
|
|
52 |
|
53 |
Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
|
54 |
|
|
|
55 |
#### Fine-tuning the model
|
56 |
|
57 |
+
You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `google/gemma-7b-it`.
|
58 |
In that repository, we provide:
|
59 |
|
60 |
* A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
|
|
|
62 |
* A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
|
63 |
|
64 |
|
|
|
65 |
#### Running the model on a CPU
|
66 |
|
67 |
|
68 |
```python
|
69 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
70 |
|
71 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
|
72 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it")
|
73 |
|
74 |
input_text = "Write me a poem about Machine Learning."
|
75 |
input_ids = tokenizer(**input_text, return_tensors="pt")
|
|
|
86 |
# pip install accelerate
|
87 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
88 |
|
89 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
|
90 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto")
|
91 |
|
92 |
input_text = "Write me a poem about Machine Learning."
|
93 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
105 |
# pip install accelerate
|
106 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
107 |
|
108 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
|
109 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.float16)
|
110 |
|
111 |
input_text = "Write me a poem about Machine Learning."
|
112 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
121 |
# pip install accelerate
|
122 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
123 |
|
124 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
|
125 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.bfloat16)
|
126 |
|
127 |
input_text = "Write me a poem about Machine Learning."
|
128 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
141 |
|
142 |
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
|
143 |
|
144 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
|
145 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", quantization_config=quantization_config)
|
146 |
|
147 |
input_text = "Write me a poem about Machine Learning."
|
148 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
159 |
|
160 |
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
|
161 |
|
162 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
|
163 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", quantization_config=quantization_config)
|
164 |
|
165 |
input_text = "Write me a poem about Machine Learning."
|
166 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
184 |
).to(0)
|
185 |
```
|
186 |
|
187 |
+
### Chat Template
|
188 |
+
|
189 |
+
The instruction-tuned models use a chat template that must be adhered to for conversational use.
|
190 |
+
The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
|
191 |
+
|
192 |
+
Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
|
193 |
+
|
194 |
+
```py
|
195 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
196 |
+
import transformers
|
197 |
+
import torch
|
198 |
+
|
199 |
+
model_id = "gg-hf/gemma-7b-it"
|
200 |
+
dtype = torch.bfloat16
|
201 |
+
|
202 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
203 |
+
model = AutoModelForCausalLM.from_pretrained(
|
204 |
+
model_id,
|
205 |
+
device_map="cuda",
|
206 |
+
torch_dtype=dtype,
|
207 |
+
)
|
208 |
+
|
209 |
+
chat = [
|
210 |
+
{ "role": "user", "content": "Write a hello world program" },
|
211 |
+
]
|
212 |
+
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
|
213 |
+
```
|
214 |
+
|
215 |
+
At this point, the prompt contains the following text:
|
216 |
+
|
217 |
+
```
|
218 |
+
<start_of_turn>user
|
219 |
+
Write a hello world program<end_of_turn>
|
220 |
+
<start_of_turn>model
|
221 |
+
```
|
222 |
+
|
223 |
+
As you can see, each turn is preceeded by a `<start_of_turn>` delimiter and then the role of the entity
|
224 |
+
(either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
|
225 |
+
the `<end_of_turn>` token.
|
226 |
+
|
227 |
+
You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
|
228 |
+
chat template.
|
229 |
+
|
230 |
+
After the prompt is ready, generation can be performed like this:
|
231 |
+
|
232 |
+
```py
|
233 |
+
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
|
234 |
+
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
|
235 |
+
```
|
236 |
+
|
237 |
### Inputs and outputs
|
238 |
|
239 |
* **Input:** Text string, such as a question, a prompt, or a document to be
|
|
|
308 |
|
309 |
### Software
|
310 |
|
311 |
+
Training was done using [JAX](https://github.com/google/jax) and [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture).
|
312 |
|
313 |
JAX allows researchers to take advantage of the latest generation of hardware,
|
314 |
including TPUs, for faster and more efficient training of large models.
|