Mistral-7B-Instruct-v0.1 / README.md

dfurman

Update README.md

107b1a8 11 months ago

preview code

raw

history blame

No virus

6.32 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- mistral
	datasets:
	- jondurbin/airoboros-2.2.1
	inference: false
	pipeline_tag: text-generation
	base_model: mistralai/Mistral-7B-v0.1
	---

	<div align="center">

	<img src="./logo.png" width="100px">

	</div>

	# Mistral-7B-Instruct-v0.1

	The Mistral-7B-Instruct-v0.1 LLM is a pretrained generative text model with 7 billion parameters geared towards instruction-following capabilities.

	## Model Details

	This model was built via parameter-efficient finetuning of the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) base model on the [jondurbin/airoboros-2.2.1](https://huggingface.co/datasets/jondurbin/airoboros-2.2.1) dataset. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 3 hours.

	- Developed by: Daniel Furman
	- Model type: Decoder-only
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Finetuned from model: [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

	## Model Sources

	- Repository: [github.com/daniel-furman/sft-demos](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/one_gpu/mistral/sft-mistral-7b-instruct-peft.ipynb)

	## Evaluation Results

	\| Metric \| Value \|
	\|-----------------------\|-------\|
	\| MMLU (5-shot) \| Coming \|
	\| ARC (25-shot) \| Coming \|
	\| HellaSwag (10-shot) \| Coming \|
	\| TruthfulQA (0-shot) \| Coming \|
	\| Avg. \| Coming \|

	We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

	## Basic Usage

	<details>

	<summary>Setup</summary>

	```python
	!pip install -q -U transformers peft torch accelerate bitsandbytes einops sentencepiece

	import torch
	from peft import PeftModel, PeftConfig
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	BitsAndBytesConfig,
	)
	```

	```python
	peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1"
	config = PeftConfig.from_pretrained(peft_model_id)

	tokenizer = AutoTokenizer.from_pretrained(
	peft_model_id,
	use_fast=True,
	trust_remote_code=True,
	)
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)
	model = AutoModelForCausalLM.from_pretrained(
	config.base_model_name_or_path,
	quantization_config=bnb_config,
	device_map="auto",
	trust_remote_code=True,
	)
	model = PeftModel.from_pretrained(
	model,
	peft_model_id
	)
	```

	</details>


	```python
	messages = [
	{"role": "user", "content": "Tell me a recipe for a mai tai."},
	]

	print("\n\n*** Prompt:")
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	print(prompt)

	print("\n\n*** Generate:")
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
	with torch.autocast("cuda", dtype=torch.bfloat16):
	output = model.generate(
	input_ids=input_ids,
	max_new_tokens=1024,
	do_sample=True,
	temperature=0.7,
	return_dict_in_generate=True,
	eos_token_id=tokenizer.eos_token_id,
	pad_token_id=tokenizer.pad_token_id,
	repetition_penalty=1.2,
	no_repeat_ngram_size=5,
	)

	response = tokenizer.decode(
	output["sequences"][0][len(input_ids[0]):],
	skip_special_tokens=True
	)
	print(response)
	```

	<details>

	<summary>Output</summary>

	Prompt:
	```python
	coming
	```

	Generation:
	```python
	coming
	```

	</details>


	## Speeds, Sizes, Times

	\| runtime / 50 tokens (sec) \| GPU \| attn \| torch dtype \| VRAM (GB) \|
	\|:-----------------------------:\|:----------------------:\|:---------------------:\|:-------------:\|:-----------------------:\|
	\| 3.1 \| 1x A100 (40 GB SXM) \| torch \| fp16 \| 13 \|

	## Training

	It took ~3 hours to train 3 epochs on 1x A100 (40 GB SXM).

	### Prompt Format

	This model was finetuned with the following format:

	```python
	tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST] ' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"
	```


	This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method. Here's an illustrative example:

	```python
	messages = [
	{"role": "user", "content": "What is your favourite condiment?"},
	{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
	{"role": "user", "content": "Do you have mayonnaise recipes?"}
	]
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	print(prompt)
	```

	<details>

	<summary>Output</summary>

	```python
	coming
	```
	</details>

	### Training Hyperparameters


	We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune LLMs on instruction-following datasets.

	The following `TrainingArguments` config was used:

	- num_train_epochs = 1
	- auto_find_batch_size = True
	- gradient_accumulation_steps = 1
	- optim = "paged_adamw_32bit"
	- save_strategy = "epoch"
	- learning_rate = 3e-4
	- lr_scheduler_type = "cosine"
	- warmup_ratio = 0.03
	- logging_strategy = "steps"
	- logging_steps = 25
	- bf16 = True

	The following `bitsandbytes` quantization config was used:

	- quant_method: bitsandbytes
	- load_in_8bit: False
	- load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: False
	- bnb_4bit_compute_dtype: bfloat16


	## Model Card Contact

	dryanfurman at gmail


	## Framework versions

	- PEFT 0.6.0.dev0