File size: 6,317 Bytes
e5c924e
dfea7ed
a585f92
6886580
 
 
f2f9527
6886580
 
a6fe6ea
e5c924e
6886580
6d6e959
 
107b1a8
6d6e959
 
04207f5
00679cb
6886580
43eea2b
6886580
ac01a3e
 
35ee48d
ac01a3e
 
 
 
129f5a8
a6fe6ea
ac01a3e
e8a9e4b
ac01a3e
 
 
e8a9e4b
6886580
 
 
 
 
 
 
 
 
14be679
6886580
e8a9e4b
70c8f6f
12df6af
 
 
 
935acbd
 
 
 
 
 
 
 
 
 
 
 
 
127759b
935acbd
 
0eb132f
 
 
 
 
 
 
 
 
 
935acbd
 
0eb132f
935acbd
 
 
0eb132f
 
 
935acbd
 
 
12df6af
 
 
935acbd
0eb132f
 
 
 
 
6b66bd1
 
 
 
 
0eb132f
935acbd
0eb132f
935acbd
 
 
 
0eb132f
935acbd
0eb132f
935acbd
 
 
 
 
 
 
0eb132f
 
 
 
 
935acbd
 
 
 
 
 
3eeaefd
 
127759b
3eeaefd
935acbd
3eeaefd
 
127759b
3eeaefd
935acbd
 
70c8f6f
0eb132f
e8a9e4b
ac01a3e
 
 
 
 
f63b23e
 
0eb132f
f63b23e
b7eea4b
 
 
 
 
84ecb2a
b7eea4b
 
 
 
f63b23e
dc744a2
 
 
 
 
 
 
 
f63b23e
dc744a2
 
 
 
 
6b66bd1
127759b
6b66bd1
dc744a2
f63b23e
b7eea4b
f63b23e
 
d5f0a4a
f63b23e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac01a3e
 
 
 
 
 
6886580
a585f92
a6fe6ea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
license: apache-2.0
library_name: peft
tags:
- mistral
datasets:
- jondurbin/airoboros-2.2.1
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
---

<div align="center">

<img src="./logo.png" width="100px">

</div>

# Mistral-7B-Instruct-v0.1

The Mistral-7B-Instruct-v0.1 LLM is a pretrained generative text model with 7 billion parameters geared towards instruction-following capabilities.

## Model Details

This model was built via parameter-efficient finetuning of the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) base model on the [jondurbin/airoboros-2.2.1](https://huggingface.co/datasets/jondurbin/airoboros-2.2.1) dataset. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 3 hours.

- **Developed by:** Daniel Furman
- **Model type:** Decoder-only
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

## Model Sources 

- **Repository:** [github.com/daniel-furman/sft-demos](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb)

## Evaluation Results

| Metric                | Value |
|-----------------------|-------|
| MMLU (5-shot)         | Coming |
| ARC (25-shot)         | Coming |
| HellaSwag (10-shot)   | Coming |
| TruthfulQA (0-shot)   | Coming |
| Avg.                  | Coming |

We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

## Basic Usage

<details>

<summary>Setup</summary>

```python
!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece

import torch
from peft import PeftModel, PeftConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
```

```python
peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(
    peft_model_id,
    use_fast=True,
    trust_remote_code=True,
)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    model, 
    peft_model_id
)
```

</details>


```python
messages = [
    {"role": "user", "content": "Tell me a recipe for a mai tai."},
]

print("\n\n*** Prompt:")
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(prompt)

print("\n\n*** Generate:")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=1024,
        do_sample=True,
        temperature=0.7,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
        no_repeat_ngram_size=5,
    )

response = tokenizer.decode(
    output["sequences"][0][len(input_ids[0]):], 
    skip_special_tokens=True
)
print(response)
```

<details>

<summary>Output</summary>

**Prompt**: 
```python
coming
```

**Generation**:
```python
coming
```

</details>


## Speeds, Sizes, Times 

| runtime / 50 tokens (sec) | GPU             | attn | torch dtype | VRAM (GB) |
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
| 3.1                        | 1x A100 (40 GB SXM)  | torch               | fp16    | 13                    |

## Training

It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM).

### Prompt Format

This model was finetuned with the following format:

```python
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST] ' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"
```


This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method. Here's an illustrative example:

```python
messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(prompt)
```

<details>

<summary>Output</summary>

```python
coming
```
</details>

### Training Hyperparameters


We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune LLMs on instruction-following datasets.

The following `TrainingArguments` config was used:

- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True

The following `bitsandbytes` quantization config was used:

- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16


## Model Card Contact

dryanfurman at gmail


## Framework versions

- PEFT 0.6.0.dev0