File size: 6,317 Bytes

e5c924e
dfea7ed
a585f92
6886580
 
 
f2f9527
6886580
 
a6fe6ea
e5c924e
6886580
6d6e959
 
107b1a8
6d6e959
 
04207f5
00679cb
6886580
43eea2b
6886580
ac01a3e
 
35ee48d
ac01a3e
 
 
 
129f5a8
a6fe6ea
ac01a3e
e8a9e4b
ac01a3e
 
 
e8a9e4b
6886580
 
 
 
 
 
 
 
 
14be679
6886580
e8a9e4b
70c8f6f
12df6af
 
 
 
935acbd
 
 
 
 
 
 
 
 
 
 
 
 
127759b
935acbd
 
0eb132f
 
 
 
 
 
 
 
 
 
935acbd
 
0eb132f
935acbd
 
 
0eb132f
 
 
935acbd
 
 
12df6af
 
 
935acbd
0eb132f
 
 
 
 
6b66bd1
 
 
 
 
0eb132f
935acbd
0eb132f
935acbd
 
 
 
0eb132f
935acbd
0eb132f
935acbd
 
 
 
 
 
 
0eb132f
 
 
 
 
935acbd
 
 
 
 
 
3eeaefd
 
127759b
3eeaefd
935acbd
3eeaefd
 
127759b
3eeaefd
935acbd
 
70c8f6f
0eb132f
e8a9e4b
ac01a3e
 
 
 
 
f63b23e
 
0eb132f
f63b23e
b7eea4b
 
 
 
 
84ecb2a
b7eea4b
 
 
 
f63b23e
dc744a2
 
 
 
 
 
 
 
f63b23e
dc744a2
 
 
 
 
6b66bd1
127759b
6b66bd1
dc744a2
f63b23e
b7eea4b
f63b23e
 
d5f0a4a
f63b23e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac01a3e
 
 
 
 
 
6886580
a585f92
a6fe6ea

---
license: apache-2.0
library_name: peft
tags:
- mistral
datasets:
- jondurbin/airoboros-2.2.1
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
---

<div align="center">

<img src="./logo.png" width="100px">

</div>

# Mistral-7B-Instruct-v0.1

The Mistral-7B-Instruct-v0.1 LLM is a pretrained generative text model with 7 billion parameters geared towards instruction-following capabilities.

## Model Details

This model was built via parameter-efficient finetuning of the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) base model on the [jondurbin/airoboros-2.2.1](https://huggingface.co/datasets/jondurbin/airoboros-2.2.1) dataset. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 3 hours.

- **Developed by:** Daniel Furman
- **Model type:** Decoder-only
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

## Model Sources 

- **Repository:** [github.com/daniel-furman/sft-demos](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb)

## Evaluation Results

| Metric                | Value |
|-----------------------|-------|
| MMLU (5-shot)         | Coming |
| ARC (25-shot)         | Coming |
| HellaSwag (10-shot)   | Coming |
| TruthfulQA (0-shot)   | Coming |
| Avg.                  | Coming |

We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

## Basic Usage

<details>

<summary>Setup</summary>

```python
!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece

import torch
from peft import PeftModel, PeftConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
```

```python
peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(
    peft_model_id,
    use_fast=True,
    trust_remote_code=True,
)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    model, 
    peft_model_id
)
```

</details>


```python
messages = [
    {"role": "user", "content": "Tell me a recipe for a mai tai."},
]

print("\n\n*** Prompt:")
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(prompt)

print("\n\n*** Generate:")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=1024,
        do_sample=True,
        temperature=0.7,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
        no_repeat_ngram_size=5,
    )

response = tokenizer.decode(
    output["sequences"][0][len(input_ids[0]):], 
    skip_special_tokens=True
)
print(response)
```

<details>

<summary>Output</summary>

**Prompt**: 
```python
coming
```

**Generation**:
```python
coming
```

</details>


## Speeds, Sizes, Times 

| runtime / 50 tokens (sec) | GPU             | attn | torch dtype | VRAM (GB) |
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
| 3.1                        | 1x A100 (40 GB SXM)  | torch               | fp16    | 13                    |

## Training

It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM).

### Prompt Format

This model was finetuned with the following format:

```python
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST] ' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"
```


This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method. Here's an illustrative example:

```python
messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(prompt)
```

<details>

<summary>Output</summary>

```python
coming
```
</details>

### Training Hyperparameters


We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune LLMs on instruction-following datasets.

The following `TrainingArguments` config was used:

- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True

The following `bitsandbytes` quantization config was used:

- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16


## Model Card Contact

dryanfurman at gmail


## Framework versions

- PEFT 0.6.0.dev0