Text Generation
PEFT
Safetensors
mistral
conversational
Eval Results
dfurman commited on
Commit
9557d5c
1 Parent(s): bb92879

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -20
README.md CHANGED
@@ -105,20 +105,7 @@ input_ids = tokenizer.apply_chat_template(
105
  return_tensors="pt",
106
  )
107
  print(tokenizer.decode(input_ids[0]))
108
- ```
109
-
110
- <details>
111
-
112
- <summary>Prompt</summary>
113
-
114
- ```python
115
- "<s> [INST] Tell me a recipe for a mai tai. [/INST]"
116
- ```
117
-
118
- </details>
119
 
120
-
121
- ```python
122
  print("\n\n*** Generate:")
123
  with torch.autocast("cuda", dtype=torch.bfloat16):
124
  output = model.generate(
@@ -138,12 +125,19 @@ response = tokenizer.decode(
138
  skip_special_tokens=True
139
  )
140
  print(response)
141
-
142
  ```
143
 
144
  <details>
145
 
146
- <summary>Generation</summary>
 
 
 
 
 
 
 
 
147
 
148
  ```python
149
  """1. Combine the following ingredients in a cocktail shaker:
@@ -171,7 +165,7 @@ Ice cubes to fill the shaker
171
 
172
  ## Training
173
 
174
- It took ~2 hours to train 2 epochs on 1x A100 (40 GB SXM).
175
 
176
  ### Prompt Format
177
 
@@ -218,9 +212,9 @@ See [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/mistral/
218
  The following `TrainingArguments` config was used:
219
 
220
  - output_dir = "./results"
221
- - num_train_epochs = 3
222
  - auto_find_batch_size = True
223
- - gradient_accumulation_steps = 1
224
  - optim = "paged_adamw_32bit"
225
  - save_strategy = "epoch"
226
  - learning_rate = 3e-4
@@ -228,8 +222,7 @@ The following `TrainingArguments` config was used:
228
  - warmup_ratio = 0.03
229
  - logging_strategy = "steps"
230
  - logging_steps = 25
231
- - evaluation_strategy = "epoch"
232
- - prediction_loss_only = True
233
  - bf16 = True
234
 
235
  The following `bitsandbytes` quantization config was used:
 
105
  return_tensors="pt",
106
  )
107
  print(tokenizer.decode(input_ids[0]))
 
 
 
 
 
 
 
 
 
 
 
108
 
 
 
109
  print("\n\n*** Generate:")
110
  with torch.autocast("cuda", dtype=torch.bfloat16):
111
  output = model.generate(
 
125
  skip_special_tokens=True
126
  )
127
  print(response)
 
128
  ```
129
 
130
  <details>
131
 
132
+ <summary>Outputs</summary>
133
+
134
+ **Prompt**
135
+
136
+ ```python
137
+ "<s> [INST] Tell me a recipe for a mai tai. [/INST]"
138
+ ```
139
+
140
+ **Generation**
141
 
142
  ```python
143
  """1. Combine the following ingredients in a cocktail shaker:
 
165
 
166
  ## Training
167
 
168
+ It took ~5 hours to train 3 epochs on 1x A100 (40 GB SXM).
169
 
170
  ### Prompt Format
171
 
 
212
  The following `TrainingArguments` config was used:
213
 
214
  - output_dir = "./results"
215
+ - num_train_epochs = 2
216
  - auto_find_batch_size = True
217
+ - gradient_accumulation_steps = 2
218
  - optim = "paged_adamw_32bit"
219
  - save_strategy = "epoch"
220
  - learning_rate = 3e-4
 
222
  - warmup_ratio = 0.03
223
  - logging_strategy = "steps"
224
  - logging_steps = 25
225
+ - evaluation_strategy = "no"
 
226
  - bf16 = True
227
 
228
  The following `bitsandbytes` quantization config was used: