dfurman
/

Mistral-7B-Instruct-v0.2

Text Generation

Model card Files Files and versions Community

dfurman commited on Nov 17, 2023

Commit

9557d5c

•

1 Parent(s): bb92879

Update README.md

Files changed (1) hide show

README.md +13 -20

README.md CHANGED Viewed

@@ -105,20 +105,7 @@ input_ids = tokenizer.apply_chat_template(
     return_tensors="pt",
 )
 print(tokenizer.decode(input_ids[0]))
-```
-<details>
-<summary>Prompt</summary>
-```python
-"<s> [INST] Tell me a recipe for a mai tai. [/INST]"
-```
-</details>
-```python
 print("\n\n*** Generate:")
 with torch.autocast("cuda", dtype=torch.bfloat16):
     output = model.generate(
@@ -138,12 +125,19 @@ response = tokenizer.decode(
     skip_special_tokens=True
 )
 print(response)
 ```
 <details>
-<summary>Generation</summary>
 ```python
 """1. Combine the following ingredients in a cocktail shaker:
@@ -171,7 +165,7 @@ Ice cubes to fill the shaker
 ## Training
-It took ~2 hours to train 2 epochs on 1x A100 (40 GB SXM).
 ### Prompt Format
@@ -218,9 +212,9 @@ See [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/mistral/
 The following `TrainingArguments` config was used:
 - output_dir = "./results"
-- num_train_epochs = 3
 - auto_find_batch_size = True
-- gradient_accumulation_steps = 1
 - optim = "paged_adamw_32bit"
 - save_strategy = "epoch"
 - learning_rate = 3e-4
@@ -228,8 +222,7 @@ The following `TrainingArguments` config was used:
 - warmup_ratio = 0.03
 - logging_strategy = "steps"
 - logging_steps = 25
-- evaluation_strategy = "epoch"
-- prediction_loss_only = True
 - bf16 = True
 The following `bitsandbytes` quantization config was used:

     return_tensors="pt",
 )
 print(tokenizer.decode(input_ids[0]))
 print("\n\n*** Generate:")
 with torch.autocast("cuda", dtype=torch.bfloat16):
     output = model.generate(
     skip_special_tokens=True
 )
 print(response)
 ```
 <details>
+<summary>Outputs</summary>
+**Prompt**
+```python
+"<s> [INST] Tell me a recipe for a mai tai. [/INST]"
+```
+**Generation**
 ```python
 """1. Combine the following ingredients in a cocktail shaker:
 ## Training
+It took ~5 hours to train 3 epochs on 1x A100 (40 GB SXM).
 ### Prompt Format
 The following `TrainingArguments` config was used:
 - output_dir = "./results"
+- num_train_epochs = 2
 - auto_find_batch_size = True
+- gradient_accumulation_steps = 2
 - optim = "paged_adamw_32bit"
 - save_strategy = "epoch"
 - learning_rate = 3e-4
 - warmup_ratio = 0.03
 - logging_strategy = "steps"
 - logging_steps = 25
+- evaluation_strategy = "no"
 - bf16 = True
 The following `bitsandbytes` quantization config was used: