fakezeta commited on
Commit
13637ca
1 Parent(s): a1429a1

Upload folder using huggingface_hub

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+
4
+ extra_gated_description: If you want to learn more about how we process your personal data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
5
+ ---
6
+
7
+ # OpenVINO IR model with int4 awq quantization with scale estimation on wikitext2 dataset
8
+
9
+ Model definition for LocalAI:
10
+ ```yaml
11
+ name: mistral-v0.3
12
+ backend: transformers
13
+ parameters:
14
+ model: fakezeta/Mistral-7B-Instruct-v0.3-ov-awq
15
+ context_size: 32768
16
+ type: OVModelForCausalLM
17
+ template:
18
+ use_tokenizer_template: true
19
+ ```
20
+
21
+ To run the model directly with LocalAI:
22
+ ```
23
+ local-ai run huggingface://fakezeta/Mistral-7B-Instruct-v0.3-ov-awq/model.yaml
24
+ ```
25
+
26
+
27
+ # Model Card for Mistral-7B-Instruct-v0.3
28
+
29
+ The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
30
+
31
+ Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
32
+ - Extended vocabulary to 32768
33
+ - Supports v3 Tokenizer
34
+ - Supports function calling
35
+
36
+ ## Installation
37
+
38
+ It is recommended to use `mistralai/Mistral-7B-Instruct-v0.3` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
39
+
40
+ ```
41
+ pip install mistral_inference
42
+ ```
43
+
44
+ ## Download
45
+
46
+ ```py
47
+ from huggingface_hub import snapshot_download
48
+ from pathlib import Path
49
+
50
+ mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
51
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
52
+
53
+ snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
54
+ ```
55
+
56
+ ### Chat
57
+
58
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
59
+
60
+ ```
61
+ mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256
62
+ ```
63
+
64
+ ### Instruct following
65
+
66
+ ```py
67
+ from mistral_inference.transformer import Transformer
68
+ from mistral_inference.generate import generate
69
+
70
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
71
+ from mistral_common.protocol.instruct.messages import UserMessage
72
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
73
+
74
+
75
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
76
+ model = Transformer.from_folder(mistral_models_path)
77
+
78
+ completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
79
+
80
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
81
+
82
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
83
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
84
+
85
+ print(result)
86
+ ```
87
+
88
+ ### Function calling
89
+
90
+ ```py
91
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
92
+ from mistral_inference.transformer import Transformer
93
+ from mistral_inference.generate import generate
94
+
95
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
96
+ from mistral_common.protocol.instruct.messages import UserMessage
97
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
98
+
99
+
100
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
101
+ model = Transformer.from_folder(mistral_models_path)
102
+
103
+ completion_request = ChatCompletionRequest(
104
+ tools=[
105
+ Tool(
106
+ function=Function(
107
+ name="get_current_weather",
108
+ description="Get the current weather",
109
+ parameters={
110
+ "type": "object",
111
+ "properties": {
112
+ "location": {
113
+ "type": "string",
114
+ "description": "The city and state, e.g. San Francisco, CA",
115
+ },
116
+ "format": {
117
+ "type": "string",
118
+ "enum": ["celsius", "fahrenheit"],
119
+ "description": "The temperature unit to use. Infer this from the users location.",
120
+ },
121
+ },
122
+ "required": ["location", "format"],
123
+ },
124
+ )
125
+ )
126
+ ],
127
+ messages=[
128
+ UserMessage(content="What's the weather like today in Paris?"),
129
+ ],
130
+ )
131
+
132
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
133
+
134
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
135
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
136
+
137
+ print(result)
138
+ ```
139
+
140
+ ## Generate with `transformers`
141
+
142
+ If you want to use Hugging Face `transformers` to generate text, you can do something like this.
143
+
144
+ ```py
145
+ from transformers import pipeline
146
+
147
+ messages = [
148
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
149
+ {"role": "user", "content": "Who are you?"},
150
+ ]
151
+ chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
152
+ chatbot(messages)
153
+ ```
154
+
155
+
156
+ ## Function calling with `transformers`
157
+
158
+ To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the
159
+ [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling)
160
+ in the `transformers` docs for more information.
161
+
162
+ ```python
163
+ from transformers import AutoModelForCausalLM, AutoTokenizer
164
+ import torch
165
+
166
+ model_id = "mistralai/Mistral-7B-Instruct-v0.3"
167
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
168
+
169
+ def get_current_weather(location: str, format: str):
170
+ """
171
+ Get the current weather
172
+
173
+ Args:
174
+ location: The city and state, e.g. San Francisco, CA
175
+ format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"])
176
+ """
177
+ pass
178
+
179
+ conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
180
+ tools = [get_current_weather]
181
+
182
+ # render the tool use prompt as a string:
183
+ tool_use_prompt = tokenizer.apply_chat_template(
184
+ conversation,
185
+ tools=tools,
186
+ tokenize=False,
187
+ add_generation_prompt=True,
188
+ )
189
+
190
+ inputs = tokenizer(tool_use_prompt, return_tensors="pt")
191
+
192
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
193
+
194
+ outputs = model.generate(**inputs, max_new_tokens=1000)
195
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
196
+ ```
197
+
198
+ Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool
199
+ results to the chat history so that the model can use them in its next generation. For a full tool calling example, please
200
+ see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling),
201
+ and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be
202
+ exactly 9 alphanumeric characters.
203
+
204
+
205
+ ## Limitations
206
+
207
+ The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
208
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
209
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
210
+
211
+ ## The Mistral AI Team
212
+
213
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall
.ipynb_checkpoints/model-checkpoint.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ name: mistral-v0.3
2
+ backend: transformers
3
+ parameters:
4
+ model: fakezeta/Mistral-7B-Instruct-v0.3-ov-awq
5
+ context_size: 32768
6
+ type: OVModelForCausalLM
7
+ template:
8
+ use_tokenizer_template: true
README.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+
4
+ extra_gated_description: If you want to learn more about how we process your personal data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
5
+ ---
6
+
7
+ # OpenVINO IR model with int4 awq quantization with scale estimation on wikitext2 dataset
8
+
9
+ Model definition for LocalAI:
10
+ ```yaml
11
+ name: mistral-v0.3
12
+ backend: transformers
13
+ parameters:
14
+ model: fakezeta/Mistral-7B-Instruct-v0.3-ov-awq
15
+ context_size: 32768
16
+ type: OVModelForCausalLM
17
+ template:
18
+ use_tokenizer_template: true
19
+ ```
20
+
21
+ To run the model directly with LocalAI:
22
+ ```
23
+ local-ai run huggingface://fakezeta/Mistral-7B-Instruct-v0.3-ov-awq/model.yaml
24
+ ```
25
+
26
+
27
+ # Model Card for Mistral-7B-Instruct-v0.3
28
+
29
+ The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
30
+
31
+ Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
32
+ - Extended vocabulary to 32768
33
+ - Supports v3 Tokenizer
34
+ - Supports function calling
35
+
36
+ ## Installation
37
+
38
+ It is recommended to use `mistralai/Mistral-7B-Instruct-v0.3` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
39
+
40
+ ```
41
+ pip install mistral_inference
42
+ ```
43
+
44
+ ## Download
45
+
46
+ ```py
47
+ from huggingface_hub import snapshot_download
48
+ from pathlib import Path
49
+
50
+ mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
51
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
52
+
53
+ snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
54
+ ```
55
+
56
+ ### Chat
57
+
58
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
59
+
60
+ ```
61
+ mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256
62
+ ```
63
+
64
+ ### Instruct following
65
+
66
+ ```py
67
+ from mistral_inference.transformer import Transformer
68
+ from mistral_inference.generate import generate
69
+
70
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
71
+ from mistral_common.protocol.instruct.messages import UserMessage
72
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
73
+
74
+
75
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
76
+ model = Transformer.from_folder(mistral_models_path)
77
+
78
+ completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
79
+
80
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
81
+
82
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
83
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
84
+
85
+ print(result)
86
+ ```
87
+
88
+ ### Function calling
89
+
90
+ ```py
91
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
92
+ from mistral_inference.transformer import Transformer
93
+ from mistral_inference.generate import generate
94
+
95
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
96
+ from mistral_common.protocol.instruct.messages import UserMessage
97
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
98
+
99
+
100
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
101
+ model = Transformer.from_folder(mistral_models_path)
102
+
103
+ completion_request = ChatCompletionRequest(
104
+ tools=[
105
+ Tool(
106
+ function=Function(
107
+ name="get_current_weather",
108
+ description="Get the current weather",
109
+ parameters={
110
+ "type": "object",
111
+ "properties": {
112
+ "location": {
113
+ "type": "string",
114
+ "description": "The city and state, e.g. San Francisco, CA",
115
+ },
116
+ "format": {
117
+ "type": "string",
118
+ "enum": ["celsius", "fahrenheit"],
119
+ "description": "The temperature unit to use. Infer this from the users location.",
120
+ },
121
+ },
122
+ "required": ["location", "format"],
123
+ },
124
+ )
125
+ )
126
+ ],
127
+ messages=[
128
+ UserMessage(content="What's the weather like today in Paris?"),
129
+ ],
130
+ )
131
+
132
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
133
+
134
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
135
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
136
+
137
+ print(result)
138
+ ```
139
+
140
+ ## Generate with `transformers`
141
+
142
+ If you want to use Hugging Face `transformers` to generate text, you can do something like this.
143
+
144
+ ```py
145
+ from transformers import pipeline
146
+
147
+ messages = [
148
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
149
+ {"role": "user", "content": "Who are you?"},
150
+ ]
151
+ chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
152
+ chatbot(messages)
153
+ ```
154
+
155
+
156
+ ## Function calling with `transformers`
157
+
158
+ To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the
159
+ [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling)
160
+ in the `transformers` docs for more information.
161
+
162
+ ```python
163
+ from transformers import AutoModelForCausalLM, AutoTokenizer
164
+ import torch
165
+
166
+ model_id = "mistralai/Mistral-7B-Instruct-v0.3"
167
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
168
+
169
+ def get_current_weather(location: str, format: str):
170
+ """
171
+ Get the current weather
172
+
173
+ Args:
174
+ location: The city and state, e.g. San Francisco, CA
175
+ format: The temperature unit to use. Infer this from the users location. (choices: ["celsius", "fahrenheit"])
176
+ """
177
+ pass
178
+
179
+ conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
180
+ tools = [get_current_weather]
181
+
182
+ # render the tool use prompt as a string:
183
+ tool_use_prompt = tokenizer.apply_chat_template(
184
+ conversation,
185
+ tools=tools,
186
+ tokenize=False,
187
+ add_generation_prompt=True,
188
+ )
189
+
190
+ inputs = tokenizer(tool_use_prompt, return_tensors="pt")
191
+
192
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
193
+
194
+ outputs = model.generate(**inputs, max_new_tokens=1000)
195
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
196
+ ```
197
+
198
+ Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool
199
+ results to the chat history so that the model can use them in its next generation. For a full tool calling example, please
200
+ see the [function calling guide](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling),
201
+ and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be
202
+ exactly 9 alphanumeric characters.
203
+
204
+
205
+ ## Limitations
206
+
207
+ The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
208
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
209
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
210
+
211
+ ## The Mistral AI Team
212
+
213
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "is_decoder": true,
15
+ "max_position_embeddings": 32768,
16
+ "model_type": "mistral",
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 32,
19
+ "num_key_value_heads": 8,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_theta": 1000000.0,
22
+ "sliding_window": null,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.43.4",
26
+ "use_cache": true,
27
+ "vocab_size": 32768
28
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.43.4"
6
+ }
model.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ name: mistral-v0.3
2
+ backend: transformers
3
+ parameters:
4
+ model: fakezeta/Mistral-7B-Instruct-v0.3-ov-awq
5
+ context_size: 32768
6
+ type: OVModelForCausalLM
7
+ template:
8
+ use_tokenizer_template: true
openvino_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "compression": null,
3
+ "dtype": "int4",
4
+ "input_info": null,
5
+ "optimum_version": "1.21.3",
6
+ "quantization_config": {
7
+ "all_layers": null,
8
+ "bits": 4,
9
+ "dataset": "wikitext2",
10
+ "group_size": 128,
11
+ "ignored_scope": null,
12
+ "num_samples": null,
13
+ "quant_method": "awq",
14
+ "ratio": 1.0,
15
+ "scale_estimation": true,
16
+ "sensitivity_metric": null,
17
+ "sym": false,
18
+ "tokenizer": null
19
+ },
20
+ "save_onnx_model": false,
21
+ "transformers_version": "4.43.4"
22
+ }
openvino_detokenizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36eac3f6c3b8b88e65de7da6129ef16cfc505ba722e8270aac03faae8c34c0f2
3
+ size 587416
openvino_detokenizer.xml ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <?xml version="1.0"?>
2
+ <net name="detokenizer" version="11">
3
+ <layers>
4
+ <layer id="0" name="Parameter_391611" type="Parameter" version="opset1">
5
+ <data shape="?,?" element_type="i64" />
6
+ <output>
7
+ <port id="0" precision="I64" names="Parameter_391611">
8
+ <dim>-1</dim>
9
+ <dim>-1</dim>
10
+ </port>
11
+ </output>
12
+ </layer>
13
+ <layer id="1" name="Constant_391580" type="Const" version="opset1">
14
+ <data element_type="u8" shape="587404" offset="0" size="587404" />
15
+ <output>
16
+ <port id="0" precision="U8">
17
+ <dim>587404</dim>
18
+ </port>
19
+ </output>
20
+ </layer>
21
+ <layer id="2" name="Convert_391626" type="Convert" version="opset1">
22
+ <data destination_type="i32" />
23
+ <input>
24
+ <port id="0" precision="I64">
25
+ <dim>-1</dim>
26
+ <dim>-1</dim>
27
+ </port>
28
+ </input>
29
+ <output>
30
+ <port id="1" precision="I32">
31
+ <dim>-1</dim>
32
+ <dim>-1</dim>
33
+ </port>
34
+ </output>
35
+ </layer>
36
+ <layer id="3" name="SentencepieceDetokenizer_391612" type="SentencepieceDetokenizer" version="extension">
37
+ <input>
38
+ <port id="0" precision="U8">
39
+ <dim>587404</dim>
40
+ </port>
41
+ <port id="1" precision="I32">
42
+ <dim>-1</dim>
43
+ <dim>-1</dim>
44
+ </port>
45
+ </input>
46
+ <output>
47
+ <port id="2" precision="I32">
48
+ <dim>-1</dim>
49
+ </port>
50
+ <port id="3" precision="I32">
51
+ <dim>-1</dim>
52
+ </port>
53
+ <port id="4" precision="U8">
54
+ <dim>-1</dim>
55
+ </port>
56
+ </output>
57
+ </layer>
58
+ <layer id="4" name="Constant_391614" type="Const" version="opset1">
59
+ <data element_type="u8" shape="10" offset="587404" size="10" />
60
+ <output>
61
+ <port id="0" precision="U8">
62
+ <dim>10</dim>
63
+ </port>
64
+ </output>
65
+ </layer>
66
+ <layer id="5" name="Constant_391616" type="Const" version="opset1">
67
+ <data element_type="u8" shape="2" offset="587414" size="2" />
68
+ <output>
69
+ <port id="0" precision="U8">
70
+ <dim>2</dim>
71
+ </port>
72
+ </output>
73
+ </layer>
74
+ <layer id="6" name="RegexNormalization_391617" type="RegexNormalization" version="extension">
75
+ <data global_replace="true" />
76
+ <input>
77
+ <port id="0" precision="I32">
78
+ <dim>-1</dim>
79
+ </port>
80
+ <port id="1" precision="I32">
81
+ <dim>-1</dim>
82
+ </port>
83
+ <port id="2" precision="U8">
84
+ <dim>-1</dim>
85
+ </port>
86
+ <port id="3" precision="U8">
87
+ <dim>10</dim>
88
+ </port>
89
+ <port id="4" precision="U8">
90
+ <dim>2</dim>
91
+ </port>
92
+ </input>
93
+ <output>
94
+ <port id="5" precision="I32">
95
+ <dim>-1</dim>
96
+ </port>
97
+ <port id="6" precision="I32">
98
+ <dim>-1</dim>
99
+ </port>
100
+ <port id="7" precision="U8">
101
+ <dim>-1</dim>
102
+ </port>
103
+ </output>
104
+ </layer>
105
+ <layer id="7" name="StringTensorPack_391618" type="StringTensorPack" version="extension">
106
+ <data mode="begins_ends" />
107
+ <input>
108
+ <port id="0" precision="I32">
109
+ <dim>-1</dim>
110
+ </port>
111
+ <port id="1" precision="I32">
112
+ <dim>-1</dim>
113
+ </port>
114
+ <port id="2" precision="U8">
115
+ <dim>-1</dim>
116
+ </port>
117
+ </input>
118
+ <output>
119
+ <port id="3" precision="STRING" names="string_output">
120
+ <dim>-1</dim>
121
+ </port>
122
+ </output>
123
+ </layer>
124
+ <layer id="8" name="Result_391619" type="Result" version="opset1">
125
+ <input>
126
+ <port id="0" precision="STRING">
127
+ <dim>-1</dim>
128
+ </port>
129
+ </input>
130
+ </layer>
131
+ </layers>
132
+ <edges>
133
+ <edge from-layer="0" from-port="0" to-layer="2" to-port="0" />
134
+ <edge from-layer="1" from-port="0" to-layer="3" to-port="0" />
135
+ <edge from-layer="2" from-port="1" to-layer="3" to-port="1" />
136
+ <edge from-layer="3" from-port="2" to-layer="6" to-port="0" />
137
+ <edge from-layer="3" from-port="3" to-layer="6" to-port="1" />
138
+ <edge from-layer="3" from-port="4" to-layer="6" to-port="2" />
139
+ <edge from-layer="4" from-port="0" to-layer="6" to-port="3" />
140
+ <edge from-layer="5" from-port="0" to-layer="6" to-port="4" />
141
+ <edge from-layer="6" from-port="5" to-layer="7" to-port="0" />
142
+ <edge from-layer="6" from-port="6" to-layer="7" to-port="1" />
143
+ <edge from-layer="6" from-port="7" to-layer="7" to-port="2" />
144
+ <edge from-layer="7" from-port="3" to-layer="8" to-port="0" />
145
+ </edges>
146
+ <rt_info>
147
+ <bos_token_id value="1" />
148
+ <chat_template value="{%- if messages[0][&quot;role&quot;] == &quot;system&quot; %}&#10; {%- set system_message = messages[0][&quot;content&quot;] %}&#10; {%- set loop_messages = messages[1:] %}&#10;{%- else %}&#10; {%- set loop_messages = messages %}&#10;{%- endif %}&#10;{%- if not tools is defined %}&#10; {%- set tools = none %}&#10;{%- endif %}&#10;{%- set user_messages = loop_messages | selectattr(&quot;role&quot;, &quot;equalto&quot;, &quot;user&quot;) | list %}&#10;&#10;{%- for message in loop_messages | rejectattr(&quot;role&quot;, &quot;equalto&quot;, &quot;tool&quot;) | rejectattr(&quot;role&quot;, &quot;equalto&quot;, &quot;tool_results&quot;) | selectattr(&quot;tool_calls&quot;, &quot;undefined&quot;) %}&#10; {%- if (message[&quot;role&quot;] == &quot;user&quot;) != (loop.index0 % 2 == 0) %}&#10; {{- raise_exception(&quot;After the optional system message, conversation roles must alternate user/assistant/user/assistant/...&quot;) }}&#10; {%- endif %}&#10;{%- endfor %}&#10;&#10;{{- bos_token }}&#10;{%- for message in loop_messages %}&#10; {%- if message[&quot;role&quot;] == &quot;user&quot; %}&#10; {%- if tools is not none and (message == user_messages[-1]) %}&#10; {{- &quot;[AVAILABLE_TOOLS] [&quot; }}&#10; {%- for tool in tools %}&#10; {%- set tool = tool.function %}&#10; {{- '{&quot;type&quot;: &quot;function&quot;, &quot;function&quot;: {' }}&#10; {%- for key, val in tool.items() if key != &quot;return&quot; %}&#10; {%- if val is string %}&#10; {{- '&quot;' + key + '&quot;: &quot;' + val + '&quot;' }}&#10; {%- else %}&#10; {{- '&quot;' + key + '&quot;: ' + val|tojson }}&#10; {%- endif %}&#10; {%- if not loop.last %}&#10; {{- &quot;, &quot; }}&#10; {%- endif %}&#10; {%- endfor %}&#10; {{- &quot;}}&quot; }}&#10; {%- if not loop.last %}&#10; {{- &quot;, &quot; }}&#10; {%- else %}&#10; {{- &quot;]&quot; }}&#10; {%- endif %}&#10; {%- endfor %}&#10; {{- &quot;[/AVAILABLE_TOOLS]&quot; }}&#10; {%- endif %}&#10; {%- if loop.last and system_message is defined %}&#10; {{- &quot;[INST] &quot; + system_message + &quot;\n\n&quot; + message[&quot;content&quot;] + &quot;[/INST]&quot; }}&#10; {%- else %}&#10; {{- &quot;[INST] &quot; + message[&quot;content&quot;] + &quot;[/INST]&quot; }}&#10; {%- endif %}&#10; {%- elif message[&quot;role&quot;] == &quot;tool_calls&quot; or message.tool_calls is defined %}&#10; {%- if message.tool_calls is defined %}&#10; {%- set tool_calls = message.tool_calls %}&#10; {%- else %}&#10; {%- set tool_calls = message.content %}&#10; {%- endif %}&#10; {{- &quot;[TOOL_CALLS] [&quot; }}&#10; {%- for tool_call in tool_calls %}&#10; {%- set out = tool_call.function|tojson %}&#10; {{- out[:-1] }}&#10; {%- if not tool_call.id is defined or tool_call.id|length != 9 %}&#10; {{- raise_exception(&quot;Tool call IDs should be alphanumeric strings with length 9!&quot;) }}&#10; {%- endif %}&#10; {{- ', &quot;id&quot;: &quot;' + tool_call.id + '&quot;}' }}&#10; {%- if not loop.last %}&#10; {{- &quot;, &quot; }}&#10; {%- else %}&#10; {{- &quot;]&quot; + eos_token }}&#10; {%- endif %}&#10; {%- endfor %}&#10; {%- elif message[&quot;role&quot;] == &quot;assistant&quot; %}&#10; {{- &quot; &quot; + message[&quot;content&quot;] + eos_token}}&#10; {%- elif message[&quot;role&quot;] == &quot;tool_results&quot; or message[&quot;role&quot;] == &quot;tool&quot; %}&#10; {%- if message.content is defined and message.content.content is defined %}&#10; {%- set content = message.content.content %}&#10; {%- else %}&#10; {%- set content = message.content %}&#10; {%- endif %}&#10; {{- '[TOOL_RESULTS] {&quot;content&quot;: ' + content|string + &quot;, &quot; }}&#10; {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}&#10; {{- raise_exception(&quot;Tool call IDs should be alphanumeric strings with length 9!&quot;) }}&#10; {%- endif %}&#10; {{- '&quot;call_id&quot;: &quot;' + message.tool_call_id + '&quot;}[/TOOL_RESULTS]' }}&#10; {%- else %}&#10; {{- raise_exception(&quot;Only user and assistant roles are supported, with the exception of an initial optional system message!&quot;) }}&#10; {%- endif %}&#10;{%- endfor %}&#10;" />
149
+ <eos_token_id value="2" />
150
+ <original_tokenizer_class value="&lt;class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>" />
151
+ </rt_info>
152
+ </net>
openvino_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:732cfef2f48103a16dfef9ea95847cdf7b5b45094af9a43bf3e5d3d5de028bc8
3
+ size 3895673424
openvino_model.xml ADDED
The diff for this file is too large to render. See raw diff
 
openvino_tokenizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d12dec2c9d0e5dcb0c8e6f6ee8e3f1fcacb2141aba3898d316c3daca980a6a3
3
+ size 587430
openvino_tokenizer.xml ADDED
@@ -0,0 +1,384 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <?xml version="1.0"?>
2
+ <net name="tokenizer" version="11">
3
+ <layers>
4
+ <layer id="0" name="string_input" type="Parameter" version="opset1">
5
+ <data shape="?" element_type="string" />
6
+ <output>
7
+ <port id="0" precision="STRING" names="string_input">
8
+ <dim>-1</dim>
9
+ </port>
10
+ </output>
11
+ </layer>
12
+ <layer id="1" name="Constant_391590" type="Const" version="opset1">
13
+ <data element_type="i32" shape="" offset="0" size="4" />
14
+ <output>
15
+ <port id="0" precision="I32" />
16
+ </output>
17
+ </layer>
18
+ <layer id="2" name="Constant_391579" type="Const" version="opset1">
19
+ <data element_type="u8" shape="587404" offset="4" size="587404" />
20
+ <output>
21
+ <port id="0" precision="U8">
22
+ <dim>587404</dim>
23
+ </port>
24
+ </output>
25
+ </layer>
26
+ <layer id="3" name="StringTensorUnpack_391582" type="StringTensorUnpack" version="extension">
27
+ <data mode="begins_ends" />
28
+ <input>
29
+ <port id="0" precision="STRING">
30
+ <dim>-1</dim>
31
+ </port>
32
+ </input>
33
+ <output>
34
+ <port id="1" precision="I32">
35
+ <dim>-1</dim>
36
+ </port>
37
+ <port id="2" precision="I32">
38
+ <dim>-1</dim>
39
+ </port>
40
+ <port id="3" precision="U8">
41
+ <dim>-1</dim>
42
+ </port>
43
+ </output>
44
+ </layer>
45
+ <layer id="4" name="Constant_391584" type="Const" version="opset1">
46
+ <data element_type="u8" shape="7" offset="587408" size="7" />
47
+ <output>
48
+ <port id="0" precision="U8">
49
+ <dim>7</dim>
50
+ </port>
51
+ </output>
52
+ </layer>
53
+ <layer id="5" name="Constant_391586" type="Const" version="opset1">
54
+ <data element_type="u8" shape="3" offset="587415" size="3" />
55
+ <output>
56
+ <port id="0" precision="U8">
57
+ <dim>3</dim>
58
+ </port>
59
+ </output>
60
+ </layer>
61
+ <layer id="6" name="RegexNormalization_391587" type="RegexNormalization" version="extension">
62
+ <data global_replace="true" />
63
+ <input>
64
+ <port id="0" precision="I32">
65
+ <dim>-1</dim>
66
+ </port>
67
+ <port id="1" precision="I32">
68
+ <dim>-1</dim>
69
+ </port>
70
+ <port id="2" precision="U8">
71
+ <dim>-1</dim>
72
+ </port>
73
+ <port id="3" precision="U8">
74
+ <dim>7</dim>
75
+ </port>
76
+ <port id="4" precision="U8">
77
+ <dim>3</dim>
78
+ </port>
79
+ </input>
80
+ <output>
81
+ <port id="5" precision="I32">
82
+ <dim>-1</dim>
83
+ </port>
84
+ <port id="6" precision="I32">
85
+ <dim>-1</dim>
86
+ </port>
87
+ <port id="7" precision="U8">
88
+ <dim>-1</dim>
89
+ </port>
90
+ </output>
91
+ </layer>
92
+ <layer id="7" name="StringTensorPack_391588" type="StringTensorPack" version="extension">
93
+ <data mode="begins_ends" />
94
+ <input>
95
+ <port id="0" precision="I32">
96
+ <dim>-1</dim>
97
+ </port>
98
+ <port id="1" precision="I32">
99
+ <dim>-1</dim>
100
+ </port>
101
+ <port id="2" precision="U8">
102
+ <dim>-1</dim>
103
+ </port>
104
+ </input>
105
+ <output>
106
+ <port id="3" precision="STRING">
107
+ <dim>-1</dim>
108
+ </port>
109
+ </output>
110
+ </layer>
111
+ <layer id="8" name="SentencepieceTokenizer_391589" type="SentencepieceTokenizer" version="extension">
112
+ <data nbest_size="0" alpha="0" add_bos="true" add_eos="false" reverse="true" />
113
+ <input>
114
+ <port id="0" precision="U8">
115
+ <dim>587404</dim>
116
+ </port>
117
+ <port id="1" precision="STRING">
118
+ <dim>-1</dim>
119
+ </port>
120
+ </input>
121
+ <output>
122
+ <port id="2" precision="I64">
123
+ <dim>-1</dim>
124
+ <dim>2</dim>
125
+ </port>
126
+ <port id="3" precision="I32">
127
+ <dim>-1</dim>
128
+ </port>
129
+ <port id="4" precision="I64">
130
+ <dim>2</dim>
131
+ </port>
132
+ </output>
133
+ </layer>
134
+ <layer id="9" name="Broadcast_391591" type="Broadcast" version="opset3">
135
+ <data mode="numpy" />
136
+ <input>
137
+ <port id="0" precision="I32" />
138
+ <port id="1" precision="I64">
139
+ <dim>2</dim>
140
+ </port>
141
+ </input>
142
+ <output>
143
+ <port id="2" precision="I32">
144
+ <dim>-1</dim>
145
+ <dim>-1</dim>
146
+ </port>
147
+ </output>
148
+ </layer>
149
+ <layer id="10" name="Constant_391592" type="Const" version="opset1">
150
+ <data element_type="i32" shape="" offset="587418" size="4" />
151
+ <output>
152
+ <port id="0" precision="I32" />
153
+ </output>
154
+ </layer>
155
+ <layer id="11" name="ShapeOf_391593" type="ShapeOf" version="opset3">
156
+ <data output_type="i64" />
157
+ <input>
158
+ <port id="0" precision="I32">
159
+ <dim>-1</dim>
160
+ </port>
161
+ </input>
162
+ <output>
163
+ <port id="1" precision="I64">
164
+ <dim>1</dim>
165
+ </port>
166
+ </output>
167
+ </layer>
168
+ <layer id="12" name="Broadcast_391594" type="Broadcast" version="opset3">
169
+ <data mode="numpy" />
170
+ <input>
171
+ <port id="0" precision="I32" />
172
+ <port id="1" precision="I64">
173
+ <dim>1</dim>
174
+ </port>
175
+ </input>
176
+ <output>
177
+ <port id="2" precision="I32">
178
+ <dim>-1</dim>
179
+ </port>
180
+ </output>
181
+ </layer>
182
+ <layer id="13" name="ScatterNDUpdate_391598" type="ScatterNDUpdate" version="opset4">
183
+ <input>
184
+ <port id="0" precision="I32">
185
+ <dim>-1</dim>
186
+ <dim>-1</dim>
187
+ </port>
188
+ <port id="1" precision="I64">
189
+ <dim>-1</dim>
190
+ <dim>2</dim>
191
+ </port>
192
+ <port id="2" precision="I32">
193
+ <dim>-1</dim>
194
+ </port>
195
+ </input>
196
+ <output>
197
+ <port id="3" precision="I32">
198
+ <dim>-1</dim>
199
+ <dim>-1</dim>
200
+ </port>
201
+ </output>
202
+ </layer>
203
+ <layer id="14" name="Constant_391602" type="Const" version="opset1">
204
+ <data element_type="i64" shape="1" offset="587422" size="8" />
205
+ <output>
206
+ <port id="0" precision="I64">
207
+ <dim>1</dim>
208
+ </port>
209
+ </output>
210
+ </layer>
211
+ <layer id="15" name="Reverse_391603" type="Reverse" version="opset1">
212
+ <data mode="index" />
213
+ <input>
214
+ <port id="0" precision="I32">
215
+ <dim>-1</dim>
216
+ <dim>-1</dim>
217
+ </port>
218
+ <port id="1" precision="I64">
219
+ <dim>1</dim>
220
+ </port>
221
+ </input>
222
+ <output>
223
+ <port id="2" precision="I32">
224
+ <dim>-1</dim>
225
+ <dim>-1</dim>
226
+ </port>
227
+ </output>
228
+ </layer>
229
+ <layer id="16" name="Reverse_391603" type="Convert" version="opset1">
230
+ <data destination_type="i64" />
231
+ <input>
232
+ <port id="0" precision="I32">
233
+ <dim>-1</dim>
234
+ <dim>-1</dim>
235
+ </port>
236
+ </input>
237
+ <output>
238
+ <port id="1" precision="I64" names="attention_mask">
239
+ <dim>-1</dim>
240
+ <dim>-1</dim>
241
+ </port>
242
+ </output>
243
+ </layer>
244
+ <layer id="18" name="Constant_391599" type="Const" version="opset1">
245
+ <data element_type="i32" shape="" offset="0" size="4" />
246
+ <output>
247
+ <port id="0" precision="I32" />
248
+ </output>
249
+ </layer>
250
+ <layer id="19" name="Broadcast_391600" type="Broadcast" version="opset3">
251
+ <data mode="bidirectional" />
252
+ <input>
253
+ <port id="0" precision="I32" />
254
+ <port id="1" precision="I64">
255
+ <dim>2</dim>
256
+ </port>
257
+ </input>
258
+ <output>
259
+ <port id="2" precision="I32">
260
+ <dim>-1</dim>
261
+ <dim>-1</dim>
262
+ </port>
263
+ </output>
264
+ </layer>
265
+ <layer id="20" name="ScatterNDUpdate_391601" type="ScatterNDUpdate" version="opset4">
266
+ <input>
267
+ <port id="0" precision="I32">
268
+ <dim>-1</dim>
269
+ <dim>-1</dim>
270
+ </port>
271
+ <port id="1" precision="I64">
272
+ <dim>-1</dim>
273
+ <dim>2</dim>
274
+ </port>
275
+ <port id="2" precision="I32">
276
+ <dim>-1</dim>
277
+ </port>
278
+ </input>
279
+ <output>
280
+ <port id="3" precision="I32">
281
+ <dim>-1</dim>
282
+ <dim>-1</dim>
283
+ </port>
284
+ </output>
285
+ </layer>
286
+ <layer id="21" name="Constant_391604" type="Const" version="opset1">
287
+ <data element_type="i64" shape="1" offset="587422" size="8" />
288
+ <output>
289
+ <port id="0" precision="I64">
290
+ <dim>1</dim>
291
+ </port>
292
+ </output>
293
+ </layer>
294
+ <layer id="22" name="Reverse_391605" type="Reverse" version="opset1">
295
+ <data mode="index" />
296
+ <input>
297
+ <port id="0" precision="I32">
298
+ <dim>-1</dim>
299
+ <dim>-1</dim>
300
+ </port>
301
+ <port id="1" precision="I64">
302
+ <dim>1</dim>
303
+ </port>
304
+ </input>
305
+ <output>
306
+ <port id="2" precision="I32">
307
+ <dim>-1</dim>
308
+ <dim>-1</dim>
309
+ </port>
310
+ </output>
311
+ </layer>
312
+ <layer id="23" name="Reverse_391605" type="Convert" version="opset1">
313
+ <data destination_type="i64" />
314
+ <input>
315
+ <port id="0" precision="I32">
316
+ <dim>-1</dim>
317
+ <dim>-1</dim>
318
+ </port>
319
+ </input>
320
+ <output>
321
+ <port id="1" precision="I64" names="input_ids">
322
+ <dim>-1</dim>
323
+ <dim>-1</dim>
324
+ </port>
325
+ </output>
326
+ </layer>
327
+ <layer id="24" name="Result_391606" type="Result" version="opset1">
328
+ <input>
329
+ <port id="0" precision="I64">
330
+ <dim>-1</dim>
331
+ <dim>-1</dim>
332
+ </port>
333
+ </input>
334
+ </layer>
335
+ <layer id="17" name="Result_391607" type="Result" version="opset1">
336
+ <input>
337
+ <port id="0" precision="I64">
338
+ <dim>-1</dim>
339
+ <dim>-1</dim>
340
+ </port>
341
+ </input>
342
+ </layer>
343
+ </layers>
344
+ <edges>
345
+ <edge from-layer="0" from-port="0" to-layer="3" to-port="0" />
346
+ <edge from-layer="1" from-port="0" to-layer="9" to-port="0" />
347
+ <edge from-layer="2" from-port="0" to-layer="8" to-port="0" />
348
+ <edge from-layer="3" from-port="1" to-layer="6" to-port="0" />
349
+ <edge from-layer="3" from-port="3" to-layer="6" to-port="2" />
350
+ <edge from-layer="3" from-port="2" to-layer="6" to-port="1" />
351
+ <edge from-layer="4" from-port="0" to-layer="6" to-port="3" />
352
+ <edge from-layer="5" from-port="0" to-layer="6" to-port="4" />
353
+ <edge from-layer="6" from-port="5" to-layer="7" to-port="0" />
354
+ <edge from-layer="6" from-port="6" to-layer="7" to-port="1" />
355
+ <edge from-layer="6" from-port="7" to-layer="7" to-port="2" />
356
+ <edge from-layer="7" from-port="3" to-layer="8" to-port="1" />
357
+ <edge from-layer="8" from-port="4" to-layer="9" to-port="1" />
358
+ <edge from-layer="8" from-port="3" to-layer="11" to-port="0" />
359
+ <edge from-layer="8" from-port="3" to-layer="20" to-port="2" />
360
+ <edge from-layer="8" from-port="2" to-layer="20" to-port="1" />
361
+ <edge from-layer="8" from-port="2" to-layer="13" to-port="1" />
362
+ <edge from-layer="8" from-port="4" to-layer="19" to-port="1" />
363
+ <edge from-layer="9" from-port="2" to-layer="13" to-port="0" />
364
+ <edge from-layer="10" from-port="0" to-layer="12" to-port="0" />
365
+ <edge from-layer="11" from-port="1" to-layer="12" to-port="1" />
366
+ <edge from-layer="12" from-port="2" to-layer="13" to-port="2" />
367
+ <edge from-layer="13" from-port="3" to-layer="15" to-port="0" />
368
+ <edge from-layer="14" from-port="0" to-layer="15" to-port="1" />
369
+ <edge from-layer="15" from-port="2" to-layer="16" to-port="0" />
370
+ <edge from-layer="16" from-port="1" to-layer="17" to-port="0" />
371
+ <edge from-layer="18" from-port="0" to-layer="19" to-port="0" />
372
+ <edge from-layer="19" from-port="2" to-layer="20" to-port="0" />
373
+ <edge from-layer="20" from-port="3" to-layer="22" to-port="0" />
374
+ <edge from-layer="21" from-port="0" to-layer="22" to-port="1" />
375
+ <edge from-layer="22" from-port="2" to-layer="23" to-port="0" />
376
+ <edge from-layer="23" from-port="1" to-layer="24" to-port="0" />
377
+ </edges>
378
+ <rt_info>
379
+ <bos_token_id value="1" />
380
+ <chat_template value="{%- if messages[0][&quot;role&quot;] == &quot;system&quot; %}&#10; {%- set system_message = messages[0][&quot;content&quot;] %}&#10; {%- set loop_messages = messages[1:] %}&#10;{%- else %}&#10; {%- set loop_messages = messages %}&#10;{%- endif %}&#10;{%- if not tools is defined %}&#10; {%- set tools = none %}&#10;{%- endif %}&#10;{%- set user_messages = loop_messages | selectattr(&quot;role&quot;, &quot;equalto&quot;, &quot;user&quot;) | list %}&#10;&#10;{%- for message in loop_messages | rejectattr(&quot;role&quot;, &quot;equalto&quot;, &quot;tool&quot;) | rejectattr(&quot;role&quot;, &quot;equalto&quot;, &quot;tool_results&quot;) | selectattr(&quot;tool_calls&quot;, &quot;undefined&quot;) %}&#10; {%- if (message[&quot;role&quot;] == &quot;user&quot;) != (loop.index0 % 2 == 0) %}&#10; {{- raise_exception(&quot;After the optional system message, conversation roles must alternate user/assistant/user/assistant/...&quot;) }}&#10; {%- endif %}&#10;{%- endfor %}&#10;&#10;{{- bos_token }}&#10;{%- for message in loop_messages %}&#10; {%- if message[&quot;role&quot;] == &quot;user&quot; %}&#10; {%- if tools is not none and (message == user_messages[-1]) %}&#10; {{- &quot;[AVAILABLE_TOOLS] [&quot; }}&#10; {%- for tool in tools %}&#10; {%- set tool = tool.function %}&#10; {{- '{&quot;type&quot;: &quot;function&quot;, &quot;function&quot;: {' }}&#10; {%- for key, val in tool.items() if key != &quot;return&quot; %}&#10; {%- if val is string %}&#10; {{- '&quot;' + key + '&quot;: &quot;' + val + '&quot;' }}&#10; {%- else %}&#10; {{- '&quot;' + key + '&quot;: ' + val|tojson }}&#10; {%- endif %}&#10; {%- if not loop.last %}&#10; {{- &quot;, &quot; }}&#10; {%- endif %}&#10; {%- endfor %}&#10; {{- &quot;}}&quot; }}&#10; {%- if not loop.last %}&#10; {{- &quot;, &quot; }}&#10; {%- else %}&#10; {{- &quot;]&quot; }}&#10; {%- endif %}&#10; {%- endfor %}&#10; {{- &quot;[/AVAILABLE_TOOLS]&quot; }}&#10; {%- endif %}&#10; {%- if loop.last and system_message is defined %}&#10; {{- &quot;[INST] &quot; + system_message + &quot;\n\n&quot; + message[&quot;content&quot;] + &quot;[/INST]&quot; }}&#10; {%- else %}&#10; {{- &quot;[INST] &quot; + message[&quot;content&quot;] + &quot;[/INST]&quot; }}&#10; {%- endif %}&#10; {%- elif message[&quot;role&quot;] == &quot;tool_calls&quot; or message.tool_calls is defined %}&#10; {%- if message.tool_calls is defined %}&#10; {%- set tool_calls = message.tool_calls %}&#10; {%- else %}&#10; {%- set tool_calls = message.content %}&#10; {%- endif %}&#10; {{- &quot;[TOOL_CALLS] [&quot; }}&#10; {%- for tool_call in tool_calls %}&#10; {%- set out = tool_call.function|tojson %}&#10; {{- out[:-1] }}&#10; {%- if not tool_call.id is defined or tool_call.id|length != 9 %}&#10; {{- raise_exception(&quot;Tool call IDs should be alphanumeric strings with length 9!&quot;) }}&#10; {%- endif %}&#10; {{- ', &quot;id&quot;: &quot;' + tool_call.id + '&quot;}' }}&#10; {%- if not loop.last %}&#10; {{- &quot;, &quot; }}&#10; {%- else %}&#10; {{- &quot;]&quot; + eos_token }}&#10; {%- endif %}&#10; {%- endfor %}&#10; {%- elif message[&quot;role&quot;] == &quot;assistant&quot; %}&#10; {{- &quot; &quot; + message[&quot;content&quot;] + eos_token}}&#10; {%- elif message[&quot;role&quot;] == &quot;tool_results&quot; or message[&quot;role&quot;] == &quot;tool&quot; %}&#10; {%- if message.content is defined and message.content.content is defined %}&#10; {%- set content = message.content.content %}&#10; {%- else %}&#10; {%- set content = message.content %}&#10; {%- endif %}&#10; {{- '[TOOL_RESULTS] {&quot;content&quot;: ' + content|string + &quot;, &quot; }}&#10; {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}&#10; {{- raise_exception(&quot;Tool call IDs should be alphanumeric strings with length 9!&quot;) }}&#10; {%- endif %}&#10; {{- '&quot;call_id&quot;: &quot;' + message.tool_call_id + '&quot;}[/TOOL_RESULTS]' }}&#10; {%- else %}&#10; {{- raise_exception(&quot;Only user and assistant roles are supported, with the exception of an initial optional system message!&quot;) }}&#10; {%- endif %}&#10;{%- endfor %}&#10;" />
381
+ <eos_token_id value="2" />
382
+ <original_tokenizer_class value="&lt;class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>" />
383
+ </rt_info>
384
+ </net>
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff