|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
base_model: mistralai/Mixtral-8x7B-v0.1 |
|
inference: |
|
parameters: |
|
temperature: 0.5 |
|
widget: |
|
- messages: |
|
- role: user |
|
content: What is your favorite condiment? |
|
--- |
|
|
|
This model is compressed from the Mixtral-8x7B. Using Low-Rank Approximation, I removed 10 billion parameters from the MLP experts' matrices, enough to run the model on a single A100 80GB GPU using half precision. |
|
|
|
|
|
The model still retains its core performance: |
|
 |
|
|
|
|
|
# Model Card for minixtral |
|
|
|
## Instruction format |
|
|
|
This format must be strictly respected, otherwise the model will generate sub-optimal outputs. |
|
|
|
The template used to build a prompt for the Instruct model is defined as follows: |
|
``` |
|
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST] |
|
``` |
|
Note that `<s>` and `</s>` are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings. |
|
|
|
As reference, here is the pseudo-code used to tokenize instructions during fine-tuning: |
|
```python |
|
def tokenize(text): |
|
return tok.encode(text, add_special_tokens=False) |
|
|
|
[BOS_ID] + |
|
tokenize("[INST]") + tokenize(USER_MESSAGE_1) + tokenize("[/INST]") + |
|
tokenize(BOT_MESSAGE_1) + [EOS_ID] + |
|
… |
|
tokenize("[INST]") + tokenize(USER_MESSAGE_N) + tokenize("[/INST]") + |
|
tokenize(BOT_MESSAGE_N) + [EOS_ID] |
|
``` |
|
|
|
In the pseudo-code above, note that the `tokenize` method should not add a BOS or EOS token automatically, but should add a prefix space. |
|
|
|
In the Transformers library, one can use [chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) which make sure the right format is applied. |
|
|
|
<details> |
|
<summary> Click to expand </summary> |
|
|
|
```diff |
|
+ import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") |
|
|
|
messages = [ |
|
{"role": "user", "content": "What is your favourite condiment?"}, |
|
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, |
|
{"role": "user", "content": "Do you have mayonnaise recipes?"} |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(input_ids, max_new_tokens=20) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
</details> |