Update README.md
Browse files
README.md
CHANGED
@@ -105,20 +105,7 @@ input_ids = tokenizer.apply_chat_template(
|
|
105 |
return_tensors="pt",
|
106 |
)
|
107 |
print(tokenizer.decode(input_ids[0]))
|
108 |
-
```
|
109 |
-
|
110 |
-
<details>
|
111 |
-
|
112 |
-
<summary>Prompt</summary>
|
113 |
-
|
114 |
-
```python
|
115 |
-
"<s> [INST] Tell me a recipe for a mai tai. [/INST]"
|
116 |
-
```
|
117 |
-
|
118 |
-
</details>
|
119 |
|
120 |
-
|
121 |
-
```python
|
122 |
print("\n\n*** Generate:")
|
123 |
with torch.autocast("cuda", dtype=torch.bfloat16):
|
124 |
output = model.generate(
|
@@ -138,12 +125,19 @@ response = tokenizer.decode(
|
|
138 |
skip_special_tokens=True
|
139 |
)
|
140 |
print(response)
|
141 |
-
|
142 |
```
|
143 |
|
144 |
<details>
|
145 |
|
146 |
-
<summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
|
148 |
```python
|
149 |
"""1. Combine the following ingredients in a cocktail shaker:
|
@@ -171,7 +165,7 @@ Ice cubes to fill the shaker
|
|
171 |
|
172 |
## Training
|
173 |
|
174 |
-
It took ~
|
175 |
|
176 |
### Prompt Format
|
177 |
|
@@ -218,9 +212,9 @@ See [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/mistral/
|
|
218 |
The following `TrainingArguments` config was used:
|
219 |
|
220 |
- output_dir = "./results"
|
221 |
-
- num_train_epochs =
|
222 |
- auto_find_batch_size = True
|
223 |
-
- gradient_accumulation_steps =
|
224 |
- optim = "paged_adamw_32bit"
|
225 |
- save_strategy = "epoch"
|
226 |
- learning_rate = 3e-4
|
@@ -228,8 +222,7 @@ The following `TrainingArguments` config was used:
|
|
228 |
- warmup_ratio = 0.03
|
229 |
- logging_strategy = "steps"
|
230 |
- logging_steps = 25
|
231 |
-
- evaluation_strategy = "
|
232 |
-
- prediction_loss_only = True
|
233 |
- bf16 = True
|
234 |
|
235 |
The following `bitsandbytes` quantization config was used:
|
|
|
105 |
return_tensors="pt",
|
106 |
)
|
107 |
print(tokenizer.decode(input_ids[0]))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
|
|
|
|
|
109 |
print("\n\n*** Generate:")
|
110 |
with torch.autocast("cuda", dtype=torch.bfloat16):
|
111 |
output = model.generate(
|
|
|
125 |
skip_special_tokens=True
|
126 |
)
|
127 |
print(response)
|
|
|
128 |
```
|
129 |
|
130 |
<details>
|
131 |
|
132 |
+
<summary>Outputs</summary>
|
133 |
+
|
134 |
+
**Prompt**
|
135 |
+
|
136 |
+
```python
|
137 |
+
"<s> [INST] Tell me a recipe for a mai tai. [/INST]"
|
138 |
+
```
|
139 |
+
|
140 |
+
**Generation**
|
141 |
|
142 |
```python
|
143 |
"""1. Combine the following ingredients in a cocktail shaker:
|
|
|
165 |
|
166 |
## Training
|
167 |
|
168 |
+
It took ~5 hours to train 3 epochs on 1x A100 (40 GB SXM).
|
169 |
|
170 |
### Prompt Format
|
171 |
|
|
|
212 |
The following `TrainingArguments` config was used:
|
213 |
|
214 |
- output_dir = "./results"
|
215 |
+
- num_train_epochs = 2
|
216 |
- auto_find_batch_size = True
|
217 |
+
- gradient_accumulation_steps = 2
|
218 |
- optim = "paged_adamw_32bit"
|
219 |
- save_strategy = "epoch"
|
220 |
- learning_rate = 3e-4
|
|
|
222 |
- warmup_ratio = 0.03
|
223 |
- logging_strategy = "steps"
|
224 |
- logging_steps = 25
|
225 |
+
- evaluation_strategy = "no"
|
|
|
226 |
- bf16 = True
|
227 |
|
228 |
The following `bitsandbytes` quantization config was used:
|