Update README.md
Browse files
README.md
CHANGED
@@ -12,11 +12,27 @@ base_model: mistralai/Mistral-7B-v0.1
|
|
12 |
---
|
13 |
|
14 |
|
15 |
-
# mistral-7b-instruct-
|
16 |
|
17 |
-
|
18 |
|
19 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
| Metric | Value |
|
22 |
|-----------------------|-------|
|
@@ -28,178 +44,81 @@ This instruction model was built via parameter-efficient QLoRA finetuning of [Mi
|
|
28 |
|
29 |
We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
30 |
|
31 |
-
## Helpful links
|
32 |
-
|
33 |
-
* Model license: Apache 2.0
|
34 |
-
* Basic usage: [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-basic-usage.ipynb)
|
35 |
-
* Finetuning code: [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb)
|
36 |
-
* Runtime stats: [here](https://huggingface.co/dfurman/mistral-7b-instruct-peft#runtime-tests)
|
37 |
-
|
38 |
-
## Loss curve
|
39 |
-
|
40 |
-
![loss curve](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/sep_27_23_mistral-7b-instruct.png)
|
41 |
-
|
42 |
-
The above loss curve was generated from the run's private wandb.ai log.
|
43 |
-
|
44 |
-
## Example prompts and responses
|
45 |
-
|
46 |
-
Example 1:
|
47 |
-
|
48 |
-
**User**:
|
49 |
-
> Write me a numbered list of things to do in New York City.
|
50 |
-
|
51 |
-
**mistral-7b-instruct-peft**:
|
52 |
-
|
53 |
-
|
54 |
-
> 1. Visit the Statue of Liberty - This iconic symbol of freedom stands tall on Liberty Island in New York Harbor. Visitors can take a ferry ride over to see it up close and learn about its history.
|
55 |
-
>
|
56 |
-
> 2. Take a Walk Through Central Park - One of the most beautiful parks in the world, Central Park offers plenty of activities for visitors, including boating, ice skating, and picnicking.
|
57 |
-
>
|
58 |
-
> 3. Catch a Broadway Show - The Great White Way is home to some of the best theater productions in the world. From musicals to plays, there's something for everyone.
|
59 |
-
>
|
60 |
-
> 4. Explore Chinatown - A vibrant neighborhood filled with authentic Chinese restaurants, shops, and cultural landmarks, Chinatown is a must-visit destination for anyone interested in experiencing another culture within the city.
|
61 |
-
>
|
62 |
-
> 5. Shop at Popular Stores and Markets - New York City has countless shopping opportunities, ranging from high-end designer boutiques to bustling street vendors selling everything imaginable. Some notable places to check out include Macy's Herald Square, Fifth Avenue, and Chelsea Market.
|
63 |
-
>
|
64 |
-
> 6. Try Delicious Local Food Options - No visit to NYC is complete without sampling some of their signature dishes. Pizza from Lombardi's (the oldest pizzeria in America), bagels from Ess-a-Bagel, and cheesecake from Junior's are just a few examples of what you should try while in the Big Apple!
|
65 |
-
>
|
66 |
-
> In conclusion, these six items make up an excellent starting point for your adventure in New York City.
|
67 |
-
|
68 |
-
<br>
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
75 |
|
76 |
-
|
77 |
|
78 |
-
|
79 |
-
>
|
80 |
-
> Please join me for an evening of delicious food and great company at my home this Friday night! Bring your favorite dish or beverage to share with everyone else. I can't wait to see you all there!
|
81 |
-
>
|
82 |
-
> Best regards,
|
83 |
-
> Your Name
|
84 |
-
>
|
85 |
-
> P.S.: Don't forget to RSVP by Wednesday so we know how much food to prepare! 😋
|
86 |
|
87 |
-
|
88 |
|
89 |
-
|
90 |
|
91 |
-
|
92 |
|
93 |
-
|
94 |
|
95 |
-
|
96 |
|
97 |
-
|
98 |
-
>
|
99 |
-
> Note: If you prefer a sweeter taste, feel free to add chocolate chips or nuts like walnuts or pecans to the batter before baking. Just remember not to overdo it as too many additions can affect how the bread rises during baking. Happy baking! 😊
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
108 |
-
This model was trained on various public datasets.
|
109 |
-
While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
110 |
|
111 |
-
|
112 |
|
113 |
-
|
114 |
-
!pip install -q -U huggingface_hub peft transformers torch accelerate
|
115 |
-
```
|
116 |
|
117 |
-
|
118 |
-
from huggingface_hub import notebook_login
|
119 |
-
import torch
|
120 |
-
from peft import PeftModel, PeftConfig
|
121 |
-
from transformers import (
|
122 |
-
AutoModelForCausalLM,
|
123 |
-
AutoTokenizer,
|
124 |
-
BitsAndBytesConfig,
|
125 |
-
pipeline,
|
126 |
-
)
|
127 |
|
128 |
-
|
129 |
-
```
|
130 |
|
131 |
-
|
132 |
-
peft_model_id = "dfurman/mistral-7b-instruct-peft"
|
133 |
-
config = PeftConfig.from_pretrained(peft_model_id)
|
134 |
|
135 |
-
|
136 |
-
load_in_4bit=True,
|
137 |
-
bnb_4bit_quant_type="nf4",
|
138 |
-
bnb_4bit_compute_dtype=torch.bfloat16,
|
139 |
-
)
|
140 |
|
141 |
-
|
142 |
-
config.base_model_name_or_path,
|
143 |
-
quantization_config=bnb_config,
|
144 |
-
use_auth_token=True,
|
145 |
-
device_map="auto",
|
146 |
-
)
|
147 |
|
148 |
-
|
149 |
-
tokenizer.pad_token = tokenizer.eos_token
|
150 |
|
151 |
-
|
152 |
|
153 |
-
format_template = "You are a helpful assistant. Write a response that appropriately completes the request. {query}\n"
|
154 |
-
```
|
155 |
|
156 |
-
|
157 |
-
# First, format the prompt
|
158 |
-
query = "Tell me a recipe for vegan banana bread."
|
159 |
-
prompt = format_template.format(query=query)
|
160 |
|
161 |
-
|
162 |
-
print("\n\n*** Generate:")
|
163 |
|
164 |
-
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
|
165 |
-
with torch.autocast("cuda", dtype=torch.bfloat16):
|
166 |
-
output = model.generate(
|
167 |
-
input_ids=input_ids,
|
168 |
-
max_new_tokens=512,
|
169 |
-
do_sample=True,
|
170 |
-
temperature=0.7,
|
171 |
-
return_dict_in_generate=True,
|
172 |
-
eos_token_id=tokenizer.eos_token_id,
|
173 |
-
pad_token_id=tokenizer.pad_token_id,
|
174 |
-
repetition_penalty=1.2,
|
175 |
-
)
|
176 |
|
177 |
-
|
178 |
-
```
|
179 |
|
180 |
-
|
181 |
|
182 |
-
|
183 |
-
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
|
184 |
-
| 3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
|
185 |
-
|
186 |
-
## Acknowledgements
|
187 |
-
|
188 |
-
This model was finetuned by Daniel Furman on Sep 27, 2023 and is for research applications only.
|
189 |
|
190 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
191 |
|
192 |
-
The
|
193 |
|
194 |
-
## mistralai/Mistral-7B-v0.1 citation
|
195 |
-
|
196 |
-
```
|
197 |
-
coming
|
198 |
-
```
|
199 |
-
|
200 |
-
## Training procedure
|
201 |
-
|
202 |
-
The following `bitsandbytes` quantization config was used during training:
|
203 |
- quant_method: bitsandbytes
|
204 |
- load_in_8bit: False
|
205 |
- load_in_4bit: True
|
@@ -211,6 +130,18 @@ The following `bitsandbytes` quantization config was used during training:
|
|
211 |
- bnb_4bit_use_double_quant: False
|
212 |
- bnb_4bit_compute_dtype: bfloat16
|
213 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
214 |
## Framework versions
|
215 |
|
216 |
- PEFT 0.6.0.dev0
|
|
|
12 |
---
|
13 |
|
14 |
|
15 |
+
# mistral-7b-instruct-v0.1
|
16 |
|
17 |
+
General instruction-following llm finetuned from [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).
|
18 |
|
19 |
+
## Model Details
|
20 |
+
|
21 |
+
### Model Description
|
22 |
+
|
23 |
+
This instruction-following llm was built via parameter-efficient QLoRA finetuning of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the first 200k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 20 hours on Google Colab. **Only** the `peft` adapter weights are included in this model repo, alonside the tokenizer.
|
24 |
+
|
25 |
+
- **Developed by:** Daniel Furman
|
26 |
+
- **Model type:** Decoder-only
|
27 |
+
- **Language(s) (NLP):** English
|
28 |
+
- **License:** Yi model license
|
29 |
+
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
30 |
+
|
31 |
+
### Model Sources
|
32 |
+
|
33 |
+
- **Repository:** [github.com/daniel-furman/sft-demos](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb)
|
34 |
+
|
35 |
+
### Evaluation Results
|
36 |
|
37 |
| Metric | Value |
|
38 |
|-----------------------|-------|
|
|
|
44 |
|
45 |
We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
+
## Uses
|
49 |
|
50 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
51 |
|
52 |
+
### Direct Use
|
53 |
|
54 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
55 |
|
56 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
+
### Downstream Use
|
59 |
|
60 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
61 |
|
62 |
+
[More Information Needed]
|
63 |
|
64 |
+
### Out-of-Scope Use
|
65 |
|
66 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
67 |
|
68 |
+
[More Information Needed]
|
|
|
|
|
69 |
|
70 |
+
## Bias, Risks, and Limitations
|
71 |
|
72 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
73 |
|
74 |
+
[More Information Needed]
|
75 |
|
76 |
+
### Recommendations
|
|
|
|
|
77 |
|
78 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
79 |
|
80 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
|
|
|
|
81 |
|
82 |
+
## How to Get Started with the Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
+
Use the code below to get started with the model.
|
|
|
85 |
|
86 |
+
[More Information Needed]
|
|
|
|
|
87 |
|
88 |
+
## Training Details
|
|
|
|
|
|
|
|
|
89 |
|
90 |
+
### Training Data
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
+
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
93 |
|
94 |
+
[More Information Needed]
|
95 |
|
|
|
|
|
96 |
|
97 |
+
### Preprocessing
|
|
|
|
|
|
|
98 |
|
99 |
+
[More Information Needed]
|
|
|
100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
|
102 |
+
### Training Hyperparameters
|
|
|
103 |
|
104 |
+
We used the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets.
|
105 |
|
106 |
+
The following `TrainingArguments` config was used:
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
|
108 |
+
- num_train_epochs = 1
|
109 |
+
- auto_find_batch_size = True
|
110 |
+
- gradient_accumulation_steps = 1
|
111 |
+
- optim = "paged_adamw_32bit"
|
112 |
+
- save_strategy = "epoch"
|
113 |
+
- learning_rate = 3e-4
|
114 |
+
- lr_scheduler_type = "cosine"
|
115 |
+
- warmup_ratio = 0.03
|
116 |
+
- logging_strategy = "steps"
|
117 |
+
- logging_steps = 25
|
118 |
+
- bf16 = True
|
119 |
|
120 |
+
The following `bitsandbytes` quantization config was used:
|
121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
- quant_method: bitsandbytes
|
123 |
- load_in_8bit: False
|
124 |
- load_in_4bit: True
|
|
|
130 |
- bnb_4bit_use_double_quant: False
|
131 |
- bnb_4bit_compute_dtype: bfloat16
|
132 |
|
133 |
+
### Speeds, Sizes, Times
|
134 |
+
|
135 |
+
| runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) |
|
136 |
+
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
|
137 |
+
| 3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
|
138 |
+
|
139 |
+
|
140 |
+
## Model Card Contact
|
141 |
+
|
142 |
+
dryanfurman at gmail
|
143 |
+
|
144 |
+
|
145 |
## Framework versions
|
146 |
|
147 |
- PEFT 0.6.0.dev0
|