yaya-sy
/

minixtral-20p

Model card Files Files and versions Community

minixtral-20p / README.md

yaya-sy's picture

Update README.md

886e2c9 verified 4 months ago

|

2.68 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: mistralai/Mixtral-8x7B-v0.1
	inference:
	parameters:
	temperature: 0.5
	widget:
	- messages:
	- role: user
	content: What is your favorite condiment?
	---

	This model is compressed from the Mixtral-8x7B. Using Low-Rank Approximation, I removed 10 billion parameters from the MLP experts' matrices, enough to run the model on a single A100 80GB GPU using half precision.


	The model still retains its core performance:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e335bbf15e7fce909fe5d4/IPWpfJfy-nDyxp47HXVzq.png)


	# Model Card for minixtral

	## Instruction format

	This format must be strictly respected, otherwise the model will generate sub-optimal outputs.

	The template used to build a prompt for the Instruct model is defined as follows:
	```
	<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
	```
	Note that `<s>` and `</s>` are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.

	As reference, here is the pseudo-code used to tokenize instructions during fine-tuning:
	```python
	def tokenize(text):
	return tok.encode(text, add_special_tokens=False)

	[BOS_ID] +
	tokenize("[INST]") + tokenize(USER_MESSAGE_1) + tokenize("[/INST]") +
	tokenize(BOT_MESSAGE_1) + [EOS_ID] +
	…
	tokenize("[INST]") + tokenize(USER_MESSAGE_N) + tokenize("[/INST]") +
	tokenize(BOT_MESSAGE_N) + [EOS_ID]
	```

	In the pseudo-code above, note that the `tokenize` method should not add a BOS or EOS token automatically, but should add a prefix space.

	In the Transformers library, one can use [chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) which make sure the right format is applied.

	<details>
	<summary> Click to expand </summary>

	```diff
	+ import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

	messages = [
	{"role": "user", "content": "What is your favourite condiment?"},
	{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
	{"role": "user", "content": "Do you have mayonnaise recipes?"}
	]

	input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

	outputs = model.generate(input_ids, max_new_tokens=20)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```
	</details>