|
--- |
|
license: apache-2.0 |
|
library_name: peft |
|
tags: |
|
- mistral |
|
datasets: |
|
- jondurbin/airoboros-2.2.1 |
|
inference: false |
|
pipeline_tag: text-generation |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
--- |
|
|
|
<div align="center"> |
|
|
|
<img src="./logo.png" width="100px"> |
|
|
|
</div> |
|
|
|
# Mistral-7B-Instruct-v0.1 |
|
|
|
The Mistral-7B-Instruct-v0.1 LLM is a pretrained generative text model with 7 billion parameters geared towards instruction-following capabilities. |
|
|
|
## Model Details |
|
|
|
This model was built via parameter-efficient finetuning of the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) base model on the [jondurbin/airoboros-2.2.1](https://huggingface.co/datasets/jondurbin/airoboros-2.2.1) dataset. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 3 hours. |
|
|
|
- **Developed by:** Daniel Furman |
|
- **Model type:** Decoder-only |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) |
|
|
|
## Model Sources |
|
|
|
- **Repository:** [github.com/daniel-furman/sft-demos](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb) |
|
|
|
## Evaluation Results |
|
|
|
| Metric | Value | |
|
|-----------------------|-------| |
|
| MMLU (5-shot) | Coming | |
|
| ARC (25-shot) | Coming | |
|
| HellaSwag (10-shot) | Coming | |
|
| TruthfulQA (0-shot) | Coming | |
|
| Avg. | Coming | |
|
|
|
We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). |
|
|
|
## Basic Usage |
|
|
|
<details> |
|
|
|
<summary>Setup</summary> |
|
|
|
```python |
|
!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece |
|
|
|
import torch |
|
from peft import PeftModel, PeftConfig |
|
from transformers import ( |
|
AutoModelForCausalLM, |
|
AutoTokenizer, |
|
BitsAndBytesConfig, |
|
) |
|
``` |
|
|
|
```python |
|
peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1" |
|
config = PeftConfig.from_pretrained(peft_model_id) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
peft_model_id, |
|
use_fast=True, |
|
trust_remote_code=True, |
|
) |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
config.base_model_name_or_path, |
|
quantization_config=bnb_config, |
|
device_map="auto", |
|
trust_remote_code=True, |
|
) |
|
model = PeftModel.from_pretrained( |
|
model, |
|
peft_model_id |
|
) |
|
``` |
|
|
|
</details> |
|
|
|
|
|
```python |
|
messages = [ |
|
{"role": "user", "content": "Tell me a recipe for a mai tai."}, |
|
] |
|
|
|
print("\n\n*** Prompt:") |
|
prompt = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
print(prompt) |
|
|
|
print("\n\n*** Generate:") |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda() |
|
with torch.autocast("cuda", dtype=torch.bfloat16): |
|
output = model.generate( |
|
input_ids=input_ids, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
temperature=0.7, |
|
return_dict_in_generate=True, |
|
eos_token_id=tokenizer.eos_token_id, |
|
pad_token_id=tokenizer.pad_token_id, |
|
repetition_penalty=1.2, |
|
no_repeat_ngram_size=5, |
|
) |
|
|
|
response = tokenizer.decode( |
|
output["sequences"][0][len(input_ids[0]):], |
|
skip_special_tokens=True |
|
) |
|
print(response) |
|
``` |
|
|
|
<details> |
|
|
|
<summary>Output</summary> |
|
|
|
**Prompt**: |
|
```python |
|
coming |
|
``` |
|
|
|
**Generation**: |
|
```python |
|
coming |
|
``` |
|
|
|
</details> |
|
|
|
|
|
## Speeds, Sizes, Times |
|
|
|
| runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) | |
|
|:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:| |
|
| 3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 | |
|
|
|
## Training |
|
|
|
It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM). |
|
|
|
### Prompt Format |
|
|
|
This model was finetuned with the following format: |
|
|
|
```python |
|
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST] ' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}" |
|
``` |
|
|
|
|
|
This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method. Here's an illustrative example: |
|
|
|
```python |
|
messages = [ |
|
{"role": "user", "content": "What is your favourite condiment?"}, |
|
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, |
|
{"role": "user", "content": "Do you have mayonnaise recipes?"} |
|
] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
print(prompt) |
|
``` |
|
|
|
<details> |
|
|
|
<summary>Output</summary> |
|
|
|
```python |
|
coming |
|
``` |
|
</details> |
|
|
|
### Training Hyperparameters |
|
|
|
|
|
We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune LLMs on instruction-following datasets. |
|
|
|
The following `TrainingArguments` config was used: |
|
|
|
- num_train_epochs = 1 |
|
- auto_find_batch_size = True |
|
- gradient_accumulation_steps = 1 |
|
- optim = "paged_adamw_32bit" |
|
- save_strategy = "epoch" |
|
- learning_rate = 3e-4 |
|
- lr_scheduler_type = "cosine" |
|
- warmup_ratio = 0.03 |
|
- logging_strategy = "steps" |
|
- logging_steps = 25 |
|
- bf16 = True |
|
|
|
The following `bitsandbytes` quantization config was used: |
|
|
|
- quant_method: bitsandbytes |
|
- load_in_8bit: False |
|
- load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: nf4 |
|
- bnb_4bit_use_double_quant: False |
|
- bnb_4bit_compute_dtype: bfloat16 |
|
|
|
|
|
## Model Card Contact |
|
|
|
dryanfurman at gmail |
|
|
|
|
|
## Framework versions |
|
|
|
- PEFT 0.6.0.dev0 |
|
|