prince-canuma commited on
Commit
94fc4ca
0 Parent(s):

Duplicate from prince-canuma/Mixtral-8x22B-Instruct-v0.1-4bit

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - fr
5
+ - it
6
+ - de
7
+ - es
8
+ - en
9
+ inference:
10
+ parameters:
11
+ temperature: 0.5
12
+ widget:
13
+ - messages:
14
+ - role: user
15
+ content: What is your favorite condiment?
16
+ ---
17
+ # Model Card for Mixtral-8x22B-Instruct-v0.1-4bit
18
+ The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
19
+
20
+ Model added by [Prince Canuma](https://twitter.com/Prince_Canuma).
21
+
22
+ For full details of this model please read our [release blog post](https://mistral.ai/news/mixtral-of-experts/).
23
+
24
+ ## Warning
25
+ This repo contains weights that are compatible with [vLLM](https://github.com/vllm-project/vllm) serving of the model as well as Hugging Face [transformers](https://github.com/huggingface/transformers) library. It is based on the original Mixtral [torrent release](magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%http://2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%http://2Ftracker.openbittorrent.com%3A80%2Fannounce), but the file format and parameter names are different. Please note that model cannot (yet) be instantiated with HF.
26
+
27
+ ## Instruction format
28
+
29
+ This format must be strictly respected, otherwise the model will generate sub-optimal outputs.
30
+
31
+ The template used to build a prompt for the Instruct model is defined as follows:
32
+ ```
33
+ <s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
34
+ ```
35
+ Note that `<s>` and `</s>` are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.
36
+
37
+ As reference, here is the pseudo-code used to tokenize instructions during fine-tuning:
38
+ ```python
39
+ def tokenize(text):
40
+ return tok.encode(text, add_special_tokens=False)
41
+
42
+ [BOS_ID] +
43
+ tokenize("[INST]") + tokenize(USER_MESSAGE_1) + tokenize("[/INST]") +
44
+ tokenize(BOT_MESSAGE_1) + [EOS_ID] +
45
+
46
+ tokenize("[INST]") + tokenize(USER_MESSAGE_N) + tokenize("[/INST]") +
47
+ tokenize(BOT_MESSAGE_N) + [EOS_ID]
48
+ ```
49
+
50
+ In the pseudo-code above, note that the `tokenize` method should not add a BOS or EOS token automatically, but should add a prefix space.
51
+
52
+ In the Transformers library, one can use [chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) which make sure the right format is applied.
53
+
54
+ ## Run the model
55
+
56
+ ```python
57
+ from transformers import AutoModelForCausalLM, AutoTokenizer
58
+
59
+ model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
60
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
61
+
62
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
63
+
64
+ messages = [
65
+ {"role": "user", "content": "What is your favourite condiment?"},
66
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
67
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
68
+ ]
69
+
70
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
71
+
72
+ outputs = model.generate(inputs, max_new_tokens=20)
73
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
74
+ ```
75
+
76
+ By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem:
77
+
78
+ ### In half-precision
79
+
80
+ Note `float16` precision only works on GPU devices
81
+
82
+ <details>
83
+ <summary> Click to expand </summary>
84
+
85
+ ```diff
86
+ + import torch
87
+ from transformers import AutoModelForCausalLM, AutoTokenizer
88
+
89
+ model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
90
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
91
+
92
+ + model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
93
+
94
+ messages = [
95
+ {"role": "user", "content": "What is your favourite condiment?"},
96
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
97
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
98
+ ]
99
+
100
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
101
+
102
+ outputs = model.generate(input_ids, max_new_tokens=20)
103
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
104
+ ```
105
+ </details>
106
+
107
+ ### Lower precision using (8-bit & 4-bit) using `bitsandbytes`
108
+
109
+ <details>
110
+ <summary> Click to expand </summary>
111
+
112
+ ```diff
113
+ + import torch
114
+ from transformers import AutoModelForCausalLM, AutoTokenizer
115
+
116
+ model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
117
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
118
+
119
+ + model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
120
+
121
+ text = "Hello my name is"
122
+ messages = [
123
+ {"role": "user", "content": "What is your favourite condiment?"},
124
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
125
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
126
+ ]
127
+
128
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
129
+
130
+ outputs = model.generate(input_ids, max_new_tokens=20)
131
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
132
+ ```
133
+ </details>
134
+
135
+ ### Load the model with Flash Attention 2
136
+
137
+ <details>
138
+ <summary> Click to expand </summary>
139
+
140
+ ```diff
141
+ + import torch
142
+ from transformers import AutoModelForCausalLM, AutoTokenizer
143
+
144
+ model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
145
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
146
+
147
+ + model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True, device_map="auto")
148
+
149
+ messages = [
150
+ {"role": "user", "content": "What is your favourite condiment?"},
151
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
152
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
153
+ ]
154
+
155
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
156
+
157
+ outputs = model.generate(input_ids, max_new_tokens=20)
158
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
159
+ ```
160
+ </details>
161
+
162
+ ## Limitations
163
+
164
+ The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
165
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
166
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
167
+
168
+ # The Mistral AI Team
169
+ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistralai/Mixtral-8x22B-Instruct-v0.1",
3
+ "architectures": [
4
+ "MixtralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 6144,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 16384,
13
+ "max_position_embeddings": 65536,
14
+ "model_type": "mixtral",
15
+ "num_attention_heads": 48,
16
+ "num_experts_per_tok": 2,
17
+ "num_hidden_layers": 56,
18
+ "num_key_value_heads": 8,
19
+ "num_local_experts": 8,
20
+ "output_router_logits": false,
21
+ "quantization_config": {
22
+ "_load_in_4bit": true,
23
+ "_load_in_8bit": false,
24
+ "bnb_4bit_compute_dtype": "bfloat16",
25
+ "bnb_4bit_quant_storage": "uint8",
26
+ "bnb_4bit_quant_type": "nf4",
27
+ "bnb_4bit_use_double_quant": false,
28
+ "llm_int8_enable_fp32_cpu_offload": false,
29
+ "llm_int8_has_fp16_weight": false,
30
+ "llm_int8_skip_modules": null,
31
+ "llm_int8_threshold": 6.0,
32
+ "load_in_4bit": true,
33
+ "load_in_8bit": false,
34
+ "quant_method": "bitsandbytes"
35
+ },
36
+ "rms_norm_eps": 1e-05,
37
+ "rope_theta": 1000000.0,
38
+ "router_aux_loss_coef": 0.001,
39
+ "sliding_window": null,
40
+ "tie_word_embeddings": false,
41
+ "torch_dtype": "float16",
42
+ "transformers_version": "4.39.3",
43
+ "use_cache": true,
44
+ "vocab_size": 32768
45
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.39.3"
6
+ }
model-00001-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3effb856ab4deaf8ca3a44ee2cda14516a83f404e75229b4da96cf9bd61504f7
3
+ size 4961063802
model-00002-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae72c1942d663758e4d362985a5d3a291fd60d4a91f0781a55fde1a0a18c90e4
3
+ size 4961824784
model-00003-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81b42f291b751e088ec72b7309752b8cd072efbd041834f8bc7366b2a556efa0
3
+ size 4954801840
model-00004-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a089eb9fed013793f48d910515f16e3194ed805e6ac583012a434fab258eb6e8
3
+ size 4961825192
model-00005-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cebf63fc90bfc0b69a35e099174131b320933ea90c68dae8dd0fc03ab6017129
3
+ size 4954802256
model-00006-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c879068ab39e0e5c6211faddb44ebadeb924a8cb2f6a0843db469d846fd3ed93
3
+ size 4961825192
model-00007-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ea69035056a913d1780cb9df15613755014ac2cd633e4471c099aa9afcdfbd8
3
+ size 4954802256
model-00008-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:788c8277664eb9c8d9daae4fe10442180ebdf8cf63e2a0f555bace15b8b328d8
3
+ size 4961825192
model-00009-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dad40002d277c55138f9222512be85ead54aa221a9c7672e1aa6a6f98a29e800
3
+ size 4954802256
model-00010-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e59c525c287e23bbb843f35370f18dedf9f89943032bb7b512dd908227e9b27
3
+ size 4961825192
model-00011-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9822ae1968915087d143aa05afaea583431de741300dbe8897279991e4db01e7
3
+ size 4954802256
model-00012-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b5a764f0d5840de84dfbfb00937624a89b4a3517192ce1ab2b3262c1763cc08
3
+ size 4961825192
model-00013-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ac38d6a2d7d934ae220e7ae1c49accee57d27ff9bb2ddb7498a6987eeeb6e14
3
+ size 4954802256
model-00014-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:228a62309613fb5edd15dccb268d757489cba974f979b16c6b64ad3aa52c1a89
3
+ size 4990163459
model-00015-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bc2b651c1ec9f93ffa1ca0d9bc0ce51620bc748d00ed820d0c9818912f3f55c
3
+ size 4983087791
model-00016-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4a4e953b46a31db63c3505c11aaab420932753fba639833a9255076e12e2125
3
+ size 4848614772
model-00017-of-00017.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4ecccfca8d27805cf6d41efb530aabcf41b7c5c1db9c557d5e07c1754e05cf4
3
+ size 402653312
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": true,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "[INST]",
31
+ "lstrip": false,
32
+ "normalized": true,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "[/INST]",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "[TOOL_CALLS]",
47
+ "lstrip": false,
48
+ "normalized": true,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "[AVAILABLE_TOOLS]",
55
+ "lstrip": false,
56
+ "normalized": true,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "7": {
62
+ "content": "[/AVAILABLE_TOOLS]",
63
+ "lstrip": false,
64
+ "normalized": true,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "9": {
70
+ "content": "[/TOOL_RESULTS]",
71
+ "lstrip": false,
72
+ "normalized": true,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "32768": {
78
+ "content": "[TOOL_RESULT]",
79
+ "lstrip": false,
80
+ "normalized": true,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ }
85
+ },
86
+ "additional_special_tokens": [],
87
+ "bos_token": "<s>",
88
+ "chat_template": "{{bos_token}}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ ' [INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' ' + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
89
+ "clean_up_tokenization_spaces": false,
90
+ "eos_token": "</s>",
91
+ "legacy": true,
92
+ "model_max_length": 1000000000000000019884624838656,
93
+ "pad_token": null,
94
+ "sp_model_kwargs": {},
95
+ "spaces_between_special_tokens": false,
96
+ "tokenizer_class": "LlamaTokenizer",
97
+ "unk_token": "<unk>",
98
+ "use_default_system_prompt": false
99
+ }