openvino-ci commited on
Commit
8740802
1 Parent(s): 681ff4e

Upload folder using huggingface_hub

Browse files
LICENSE ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Intel Research Use License Agreement
2
+
3
+ **By using or distributing any portion or element of the Material, You agree to be bound by this Agreement.**
4
+
5
+ 1 - **Definitions.**
6
+ **“Agreement”** means the terms and conditions for use, reproduction and distribution of the **Material** set forth herein.
7
+
8
+ **“Material”** means datasets, models, model weights, or other software Intel makes available to you under this **Agreement**.
9
+
10
+ **“You” or “Your”** means **you**, or **your** employer or any other person or entity (if you are entering into this **Agreement** on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consentand that has legal authority to bind **your** employer or such other person or entity if **you** are entering in this **Agreement** on their behalf.
11
+
12
+ 2 - **License.** Intel grants to **You**, a limited, non-transferable, non-sublicensable, non-exclusive, worldwide, royalty-free, license under Intel’s copyrights in the **Material**, to use, reproduce, display, and distribute the **Material** solely for research purposes.
13
+
14
+ 2.1 - **Restrictions** Except as authorized above, **You** will not: (a) use **Material** in any other way; (b)reverse engineer, decompile, or disassemble the **Material**, or (c) use **Material** to violate or aid in the violation of any international human right.
15
+
16
+ 2.2 - **No Implied License**. Except for the express license in Section
17
+ 2, Intel does not grant **You** any express or implied license under any legal theory. Any other licenses from Intel require additional consideration. Nothing in this **Agreement** requires Intel to grant any additional license.
18
+
19
+ 3 - **Third party programs.** **Your** use of certain third-party software with or within the **Material** is subject to **your** compliance with licensing you obtain directly from that third-party. A listing of any such third-party software may accompany the **Material**.
20
+
21
+ 4 - **No Warranty.** The **Material** is provided “as is,” without any express or implied warranty of any kind including warranties of merchantability, non-infringement, title, or fitness for a particular purpose. The **Material** may be pre-release and may not be fully functional. Intel is not required to maintain, update, or support any **Material**. **You** are solely responsible for determining the appropriateness of using or distributing the **Material** and assume any risks associated with **Your** use of the **Material**.
22
+
23
+ 5 - **Limitation on Liability.** IN NO EVENT WILL INTEL BE LIABLE TO YOU UNDER ANY THEORY OF LIABILITY FOR ANY LOST PROFITS OR DAMAGES (INCLUDING BUT NOT LIMITED TO DIRECT, INDIRECT, SPECULATIVE, SPECIAL OR CONSEQUENTIAL DAMAGES) WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THIS **AGREEMENT** (EVEN IF INTEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSSES OR DAMAGES).
24
+
25
+ 6 - **Term and Termination.** The term of this **Agreement** will commence upon **Your** acceptance of this **Agreement** or access to the **Material** and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Intel may terminate this **Agreement** if **You** are in breach of any term or condition of this **Agreement**. Upon termination of this **Agreement**, **You** shall delete and cease use of the **Material**. Sections 4, 5, and 7 will survive the termination of this **Agreement**.
26
+
27
+ 7 - **Governing Law and Jurisdiction.** All disputes will be governed by the laws of the United States of America and the State of Delaware without reference to conflict of law principles and subject to the exclusive jurisdiction of the state or federal courts sitting in the State of Delaware, and each party agrees that it submits to the personal jurisdiction and venue of those courts and waives any objections.
28
+
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ ---
4
+
5
+ # Phi-3-mini-FastDraft-50M-int8-ov
6
+ ## Description
7
+
8
+ FastDraft is a novel and efficient approach for pre-training and aligning a draft model to any LLM to be used with speculative decoding, by incorporating efficient pre-training followed by fine-tuning over synthetic datasets generated by the target model.
9
+ FastDraft was presented in https://arxiv.org/abs/2411.11055 at ENLSP@NeurIPS24 by Intel Labs.
10
+
11
+ This is a draft model that was trained with FastDraft to accompany [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct).
12
+
13
+ This is Phi-3-mini-FastDraft-50M model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to int8 by [NNCF](https://github.com/openvinotoolkit/nncf).
14
+
15
+ ## Quantization Parameters
16
+
17
+ Weight compression was performed using `nncf.compress_weights` with the following parameters:
18
+
19
+ <nncf>
20
+ <weight_compression>
21
+ <all_layers value="False"/>
22
+ <awq value="False"/>
23
+ <group_size value="128"/>
24
+ <ignored_scope>
25
+ <names value="[]"/>
26
+ <patterns value="[]"/>
27
+ <subgraphs value="[]"/>
28
+ <types value="[]"/>
29
+ <validate value="True"/>
30
+ </ignored_scope>
31
+ <mode value="int8"/>
32
+ <ratio value="1"/>
33
+ <sensitivity_metric value="weight_quantization_error"/>
34
+ </weight_compression>
35
+ </nncf>
36
+
37
+ For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
38
+
39
+ ## Compatibility
40
+
41
+ The provided OpenVINO™ IR model is compatible with:
42
+
43
+ * OpenVINO version <2024.4 > and higher
44
+ * Optimum Intel <1.20.0> and higher
45
+
46
+ ## Running Model Inference with OpenVINO GenAI
47
+
48
+ 1. Install packages required for using [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) with Speculative decoding:
49
+
50
+ ```
51
+ pip install openvino-genai huggingface_hub
52
+ ```
53
+
54
+ 2. Download models from HuggingFace Hub
55
+ ```
56
+ import huggingface_hub as hf_hub
57
+
58
+ main_model_id = “OpenVINO/Phi-3-mini-4k-instruct-int4-ov”
59
+ draft_model_id = "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov"
60
+
61
+ main_model_path = "main"
62
+ draft_model_path = “draft”
63
+
64
+ hf_hub.snapshot_download(main_model_id, local_dir=main_model_path)
65
+ hf_hub.snapshot_download(draft_model_id, local_dir=draft_model_path)
66
+ ```
67
+ 3. Run model inference using the speculative decoding and specify the pipeline parameters:
68
+ ```
69
+ import openvino_genai
70
+
71
+ prompt = “What is OpenVINO?”
72
+
73
+ config = openvino_genai.GenerationConfig()
74
+ config.num_assistant_tokens = 3
75
+ config.max_new_tokens = 128
76
+
77
+ def streamer(subword):
78
+ print(subword, end='', flush=True)
79
+ return False
80
+
81
+ main_device = "CPU"
82
+ draft_device = "CPU"
83
+
84
+ draft_model = openvino_genai.draft_model(draft_model_path, draft_device)
85
+
86
+ scheduler_config = openvino_genai.SchedulerConfig()
87
+ scheduler_config.cache_size = 2
88
+
89
+ pipe = openvino_genai.LLMPipeline(main_model_path, main_device, scheduler_config=scheduler_config, draft_model=draft_model)
90
+
91
+ pipe.generate(prompt, config, streamer)
92
+ ```
93
+
94
+ More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md) and [samples](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples)
95
+
96
+
97
+ ## Disclaimer
98
+
99
+ Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/mnt/beegfs/mixed-tier/share/projects/fastdraft/project_finetuned_models/phi3-base-50M_fineweb5BT_CP_5BTcode_2.5BTtext/combi_alpaca_oig_magpie_code-evol__alpaca",
3
+ "architectures": [
4
+ "Phi3ForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "embd_pdrop": 0.0,
10
+ "eos_token_id": 32000,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 512,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 1408,
15
+ "max_position_embeddings": 4096,
16
+ "model_type": "phi3",
17
+ "num_attention_heads": 8,
18
+ "num_hidden_layers": 6,
19
+ "num_key_value_heads": 8,
20
+ "original_max_position_embeddings": 4096,
21
+ "pad_token_id": 32000,
22
+ "resid_pdrop": 0.0,
23
+ "rms_norm_eps": 1e-05,
24
+ "rope_scaling": null,
25
+ "rope_theta": 10000.0,
26
+ "sliding_window": 2047,
27
+ "tie_word_embeddings": false,
28
+ "transformers_version": "4.44.0",
29
+ "use_cache": false,
30
+ "vocab_size": 32064
31
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 32000,
5
+ "pad_token_id": 32000,
6
+ "transformers_version": "4.44.0",
7
+ "use_cache": false
8
+ }
openvino_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cecf0224201415144c00cf3a6cf3350306f9c78888d631eb590939a63722fefa
3
+ size 52417240
openvino_model.xml ADDED
The diff for this file is too large to render. See raw diff
 
openvino_tokenizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f3cbe4012e3eee81f0fe6958d0e6d743667a5d0d2bf8979ca7963a42d336357
3
+ size 500292
openvino_tokenizer.xml ADDED
@@ -0,0 +1,469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <?xml version="1.0"?>
2
+ <net name="tokenizer" version="11">
3
+ <layers>
4
+ <layer id="0" name="string_input" type="Parameter" version="opset1">
5
+ <data shape="?" element_type="string" />
6
+ <output>
7
+ <port id="0" precision="STRING" names="string_input">
8
+ <dim>-1</dim>
9
+ </port>
10
+ </output>
11
+ </layer>
12
+ <layer id="1" name="Constant_273721" type="Const" version="opset1">
13
+ <data element_type="i32" shape="" offset="0" size="4" />
14
+ <output>
15
+ <port id="0" precision="I32" />
16
+ </output>
17
+ </layer>
18
+ <layer id="2" name="Constant_273713" type="Const" version="opset1">
19
+ <data element_type="u8" shape="499969" offset="4" size="499969" />
20
+ <output>
21
+ <port id="0" precision="U8">
22
+ <dim>499969</dim>
23
+ </port>
24
+ </output>
25
+ </layer>
26
+ <layer id="3" name="Constant_273717" type="Const" version="opset1">
27
+ <data element_type="u8" shape="223" offset="499973" size="223" />
28
+ <output>
29
+ <port id="0" precision="U8">
30
+ <dim>223</dim>
31
+ </port>
32
+ </output>
33
+ </layer>
34
+ <layer id="4" name="StringTensorUnpack_273718" type="StringTensorUnpack" version="extension">
35
+ <data mode="begins_ends" />
36
+ <input>
37
+ <port id="0" precision="U8">
38
+ <dim>223</dim>
39
+ </port>
40
+ </input>
41
+ <output>
42
+ <port id="1" precision="I32">
43
+ <dim>-1</dim>
44
+ </port>
45
+ <port id="2" precision="I32">
46
+ <dim>-1</dim>
47
+ </port>
48
+ <port id="3" precision="U8">
49
+ <dim>-1</dim>
50
+ </port>
51
+ </output>
52
+ </layer>
53
+ <layer id="5" name="Constant_273719" type="Const" version="opset1">
54
+ <data element_type="i32" shape="14" offset="500196" size="56" />
55
+ <output>
56
+ <port id="0" precision="I32">
57
+ <dim>14</dim>
58
+ </port>
59
+ </output>
60
+ </layer>
61
+ <layer id="6" name="SentencepieceTokenizer_273720" type="SentencepieceTokenizer" version="extension">
62
+ <data nbest_size="0" alpha="0" add_bos="false" add_eos="false" reverse="true" />
63
+ <input>
64
+ <port id="0" precision="U8">
65
+ <dim>499969</dim>
66
+ </port>
67
+ <port id="1" precision="STRING">
68
+ <dim>-1</dim>
69
+ </port>
70
+ <port id="2" precision="I32">
71
+ <dim>-1</dim>
72
+ </port>
73
+ <port id="3" precision="I32">
74
+ <dim>-1</dim>
75
+ </port>
76
+ <port id="4" precision="U8">
77
+ <dim>-1</dim>
78
+ </port>
79
+ <port id="5" precision="I32">
80
+ <dim>14</dim>
81
+ </port>
82
+ </input>
83
+ <output>
84
+ <port id="6" precision="I64">
85
+ <dim>-1</dim>
86
+ <dim>2</dim>
87
+ </port>
88
+ <port id="7" precision="I32">
89
+ <dim>-1</dim>
90
+ </port>
91
+ <port id="8" precision="I64">
92
+ <dim>2</dim>
93
+ </port>
94
+ </output>
95
+ </layer>
96
+ <layer id="7" name="Broadcast_273722" type="Broadcast" version="opset3">
97
+ <data mode="numpy" />
98
+ <input>
99
+ <port id="0" precision="I32" />
100
+ <port id="1" precision="I64">
101
+ <dim>2</dim>
102
+ </port>
103
+ </input>
104
+ <output>
105
+ <port id="2" precision="I32">
106
+ <dim>-1</dim>
107
+ <dim>-1</dim>
108
+ </port>
109
+ </output>
110
+ </layer>
111
+ <layer id="8" name="Constant_273723" type="Const" version="opset1">
112
+ <data element_type="i32" shape="" offset="500252" size="4" />
113
+ <output>
114
+ <port id="0" precision="I32" />
115
+ </output>
116
+ </layer>
117
+ <layer id="9" name="ShapeOf_273724" type="ShapeOf" version="opset3">
118
+ <data output_type="i64" />
119
+ <input>
120
+ <port id="0" precision="I32">
121
+ <dim>-1</dim>
122
+ </port>
123
+ </input>
124
+ <output>
125
+ <port id="1" precision="I64">
126
+ <dim>1</dim>
127
+ </port>
128
+ </output>
129
+ </layer>
130
+ <layer id="10" name="Broadcast_273725" type="Broadcast" version="opset3">
131
+ <data mode="numpy" />
132
+ <input>
133
+ <port id="0" precision="I32" />
134
+ <port id="1" precision="I64">
135
+ <dim>1</dim>
136
+ </port>
137
+ </input>
138
+ <output>
139
+ <port id="2" precision="I32">
140
+ <dim>-1</dim>
141
+ </port>
142
+ </output>
143
+ </layer>
144
+ <layer id="11" name="ScatterNDUpdate_273729" type="ScatterNDUpdate" version="opset4">
145
+ <input>
146
+ <port id="0" precision="I32">
147
+ <dim>-1</dim>
148
+ <dim>-1</dim>
149
+ </port>
150
+ <port id="1" precision="I64">
151
+ <dim>-1</dim>
152
+ <dim>2</dim>
153
+ </port>
154
+ <port id="2" precision="I32">
155
+ <dim>-1</dim>
156
+ </port>
157
+ </input>
158
+ <output>
159
+ <port id="3" precision="I32">
160
+ <dim>-1</dim>
161
+ <dim>-1</dim>
162
+ </port>
163
+ </output>
164
+ </layer>
165
+ <layer id="12" name="Constant_273733" type="Const" version="opset1">
166
+ <data element_type="i64" shape="1" offset="500256" size="8" />
167
+ <output>
168
+ <port id="0" precision="I64">
169
+ <dim>1</dim>
170
+ </port>
171
+ </output>
172
+ </layer>
173
+ <layer id="13" name="Reverse_273734" type="Reverse" version="opset1">
174
+ <data mode="index" />
175
+ <input>
176
+ <port id="0" precision="I32">
177
+ <dim>-1</dim>
178
+ <dim>-1</dim>
179
+ </port>
180
+ <port id="1" precision="I64">
181
+ <dim>1</dim>
182
+ </port>
183
+ </input>
184
+ <output>
185
+ <port id="2" precision="I32">
186
+ <dim>-1</dim>
187
+ <dim>-1</dim>
188
+ </port>
189
+ </output>
190
+ </layer>
191
+ <layer id="14" name="Constant_273742" type="Const" version="opset1">
192
+ <data element_type="i64" shape="1" offset="500264" size="8" />
193
+ <output>
194
+ <port id="0" precision="I64">
195
+ <dim>1</dim>
196
+ </port>
197
+ </output>
198
+ </layer>
199
+ <layer id="15" name="Constant_273743" type="Const" version="opset1">
200
+ <data element_type="i64" shape="1" offset="500272" size="8" />
201
+ <output>
202
+ <port id="0" precision="I64">
203
+ <dim>1</dim>
204
+ </port>
205
+ </output>
206
+ </layer>
207
+ <layer id="16" name="Constant_273744" type="Const" version="opset1">
208
+ <data element_type="i64" shape="1" offset="500280" size="8" />
209
+ <output>
210
+ <port id="0" precision="I64">
211
+ <dim>1</dim>
212
+ </port>
213
+ </output>
214
+ </layer>
215
+ <layer id="17" name="Constant_273745" type="Const" version="opset1">
216
+ <data element_type="i64" shape="1" offset="500256" size="8" />
217
+ <output>
218
+ <port id="0" precision="I64">
219
+ <dim>1</dim>
220
+ </port>
221
+ </output>
222
+ </layer>
223
+ <layer id="18" name="Slice_273746" type="Slice" version="opset8">
224
+ <input>
225
+ <port id="0" precision="I32">
226
+ <dim>-1</dim>
227
+ <dim>-1</dim>
228
+ </port>
229
+ <port id="1" precision="I64">
230
+ <dim>1</dim>
231
+ </port>
232
+ <port id="2" precision="I64">
233
+ <dim>1</dim>
234
+ </port>
235
+ <port id="3" precision="I64">
236
+ <dim>1</dim>
237
+ </port>
238
+ <port id="4" precision="I64">
239
+ <dim>1</dim>
240
+ </port>
241
+ </input>
242
+ <output>
243
+ <port id="5" precision="I32">
244
+ <dim>-1</dim>
245
+ <dim>-1</dim>
246
+ </port>
247
+ </output>
248
+ </layer>
249
+ <layer id="19" name="Slice_273746" type="Convert" version="opset1">
250
+ <data destination_type="i64" />
251
+ <input>
252
+ <port id="0" precision="I32">
253
+ <dim>-1</dim>
254
+ <dim>-1</dim>
255
+ </port>
256
+ </input>
257
+ <output>
258
+ <port id="1" precision="I64" names="attention_mask">
259
+ <dim>-1</dim>
260
+ <dim>-1</dim>
261
+ </port>
262
+ </output>
263
+ </layer>
264
+ <layer id="21" name="Constant_273730" type="Const" version="opset1">
265
+ <data element_type="i32" shape="" offset="500288" size="4" />
266
+ <output>
267
+ <port id="0" precision="I32" />
268
+ </output>
269
+ </layer>
270
+ <layer id="22" name="Broadcast_273731" type="Broadcast" version="opset3">
271
+ <data mode="bidirectional" />
272
+ <input>
273
+ <port id="0" precision="I32" />
274
+ <port id="1" precision="I64">
275
+ <dim>2</dim>
276
+ </port>
277
+ </input>
278
+ <output>
279
+ <port id="2" precision="I32">
280
+ <dim>-1</dim>
281
+ <dim>-1</dim>
282
+ </port>
283
+ </output>
284
+ </layer>
285
+ <layer id="23" name="ScatterNDUpdate_273732" type="ScatterNDUpdate" version="opset4">
286
+ <input>
287
+ <port id="0" precision="I32">
288
+ <dim>-1</dim>
289
+ <dim>-1</dim>
290
+ </port>
291
+ <port id="1" precision="I64">
292
+ <dim>-1</dim>
293
+ <dim>2</dim>
294
+ </port>
295
+ <port id="2" precision="I32">
296
+ <dim>-1</dim>
297
+ </port>
298
+ </input>
299
+ <output>
300
+ <port id="3" precision="I32">
301
+ <dim>-1</dim>
302
+ <dim>-1</dim>
303
+ </port>
304
+ </output>
305
+ </layer>
306
+ <layer id="24" name="Constant_273735" type="Const" version="opset1">
307
+ <data element_type="i64" shape="1" offset="500256" size="8" />
308
+ <output>
309
+ <port id="0" precision="I64">
310
+ <dim>1</dim>
311
+ </port>
312
+ </output>
313
+ </layer>
314
+ <layer id="25" name="Reverse_273736" type="Reverse" version="opset1">
315
+ <data mode="index" />
316
+ <input>
317
+ <port id="0" precision="I32">
318
+ <dim>-1</dim>
319
+ <dim>-1</dim>
320
+ </port>
321
+ <port id="1" precision="I64">
322
+ <dim>1</dim>
323
+ </port>
324
+ </input>
325
+ <output>
326
+ <port id="2" precision="I32">
327
+ <dim>-1</dim>
328
+ <dim>-1</dim>
329
+ </port>
330
+ </output>
331
+ </layer>
332
+ <layer id="26" name="Constant_273737" type="Const" version="opset1">
333
+ <data element_type="i64" shape="1" offset="500264" size="8" />
334
+ <output>
335
+ <port id="0" precision="I64">
336
+ <dim>1</dim>
337
+ </port>
338
+ </output>
339
+ </layer>
340
+ <layer id="27" name="Constant_273738" type="Const" version="opset1">
341
+ <data element_type="i64" shape="1" offset="500272" size="8" />
342
+ <output>
343
+ <port id="0" precision="I64">
344
+ <dim>1</dim>
345
+ </port>
346
+ </output>
347
+ </layer>
348
+ <layer id="28" name="Constant_273739" type="Const" version="opset1">
349
+ <data element_type="i64" shape="1" offset="500280" size="8" />
350
+ <output>
351
+ <port id="0" precision="I64">
352
+ <dim>1</dim>
353
+ </port>
354
+ </output>
355
+ </layer>
356
+ <layer id="29" name="Constant_273740" type="Const" version="opset1">
357
+ <data element_type="i64" shape="1" offset="500256" size="8" />
358
+ <output>
359
+ <port id="0" precision="I64">
360
+ <dim>1</dim>
361
+ </port>
362
+ </output>
363
+ </layer>
364
+ <layer id="30" name="Slice_273741" type="Slice" version="opset8">
365
+ <input>
366
+ <port id="0" precision="I32">
367
+ <dim>-1</dim>
368
+ <dim>-1</dim>
369
+ </port>
370
+ <port id="1" precision="I64">
371
+ <dim>1</dim>
372
+ </port>
373
+ <port id="2" precision="I64">
374
+ <dim>1</dim>
375
+ </port>
376
+ <port id="3" precision="I64">
377
+ <dim>1</dim>
378
+ </port>
379
+ <port id="4" precision="I64">
380
+ <dim>1</dim>
381
+ </port>
382
+ </input>
383
+ <output>
384
+ <port id="5" precision="I32">
385
+ <dim>-1</dim>
386
+ <dim>-1</dim>
387
+ </port>
388
+ </output>
389
+ </layer>
390
+ <layer id="31" name="Slice_273741" type="Convert" version="opset1">
391
+ <data destination_type="i64" />
392
+ <input>
393
+ <port id="0" precision="I32">
394
+ <dim>-1</dim>
395
+ <dim>-1</dim>
396
+ </port>
397
+ </input>
398
+ <output>
399
+ <port id="1" precision="I64" names="input_ids">
400
+ <dim>-1</dim>
401
+ <dim>-1</dim>
402
+ </port>
403
+ </output>
404
+ </layer>
405
+ <layer id="32" name="Result_273747" type="Result" version="opset1">
406
+ <input>
407
+ <port id="0" precision="I64">
408
+ <dim>-1</dim>
409
+ <dim>-1</dim>
410
+ </port>
411
+ </input>
412
+ </layer>
413
+ <layer id="20" name="Result_273748" type="Result" version="opset1">
414
+ <input>
415
+ <port id="0" precision="I64">
416
+ <dim>-1</dim>
417
+ <dim>-1</dim>
418
+ </port>
419
+ </input>
420
+ </layer>
421
+ </layers>
422
+ <edges>
423
+ <edge from-layer="0" from-port="0" to-layer="6" to-port="1" />
424
+ <edge from-layer="1" from-port="0" to-layer="7" to-port="0" />
425
+ <edge from-layer="2" from-port="0" to-layer="6" to-port="0" />
426
+ <edge from-layer="3" from-port="0" to-layer="4" to-port="0" />
427
+ <edge from-layer="4" from-port="1" to-layer="6" to-port="2" />
428
+ <edge from-layer="4" from-port="2" to-layer="6" to-port="3" />
429
+ <edge from-layer="4" from-port="3" to-layer="6" to-port="4" />
430
+ <edge from-layer="5" from-port="0" to-layer="6" to-port="5" />
431
+ <edge from-layer="6" from-port="8" to-layer="22" to-port="1" />
432
+ <edge from-layer="6" from-port="6" to-layer="23" to-port="1" />
433
+ <edge from-layer="6" from-port="7" to-layer="23" to-port="2" />
434
+ <edge from-layer="6" from-port="6" to-layer="11" to-port="1" />
435
+ <edge from-layer="6" from-port="7" to-layer="9" to-port="0" />
436
+ <edge from-layer="6" from-port="8" to-layer="7" to-port="1" />
437
+ <edge from-layer="7" from-port="2" to-layer="11" to-port="0" />
438
+ <edge from-layer="8" from-port="0" to-layer="10" to-port="0" />
439
+ <edge from-layer="9" from-port="1" to-layer="10" to-port="1" />
440
+ <edge from-layer="10" from-port="2" to-layer="11" to-port="2" />
441
+ <edge from-layer="11" from-port="3" to-layer="13" to-port="0" />
442
+ <edge from-layer="12" from-port="0" to-layer="13" to-port="1" />
443
+ <edge from-layer="13" from-port="2" to-layer="18" to-port="0" />
444
+ <edge from-layer="14" from-port="0" to-layer="18" to-port="1" />
445
+ <edge from-layer="15" from-port="0" to-layer="18" to-port="2" />
446
+ <edge from-layer="16" from-port="0" to-layer="18" to-port="3" />
447
+ <edge from-layer="17" from-port="0" to-layer="18" to-port="4" />
448
+ <edge from-layer="18" from-port="5" to-layer="19" to-port="0" />
449
+ <edge from-layer="19" from-port="1" to-layer="20" to-port="0" />
450
+ <edge from-layer="21" from-port="0" to-layer="22" to-port="0" />
451
+ <edge from-layer="22" from-port="2" to-layer="23" to-port="0" />
452
+ <edge from-layer="23" from-port="3" to-layer="25" to-port="0" />
453
+ <edge from-layer="24" from-port="0" to-layer="25" to-port="1" />
454
+ <edge from-layer="25" from-port="2" to-layer="30" to-port="0" />
455
+ <edge from-layer="26" from-port="0" to-layer="30" to-port="1" />
456
+ <edge from-layer="27" from-port="0" to-layer="30" to-port="2" />
457
+ <edge from-layer="28" from-port="0" to-layer="30" to-port="3" />
458
+ <edge from-layer="29" from-port="0" to-layer="30" to-port="4" />
459
+ <edge from-layer="30" from-port="5" to-layer="31" to-port="0" />
460
+ <edge from-layer="31" from-port="1" to-layer="32" to-port="0" />
461
+ </edges>
462
+ <rt_info>
463
+ <bos_token_id value="1" />
464
+ <chat_template value="{% for message in messages %}{% if message['role'] == 'system' %}{{'&lt;|system|>&#10;' + message['content'] + '&lt;|end|>&#10;'}}{% elif message['role'] == 'user' %}{{'&lt;|user|>&#10;' + message['content'] + '&lt;|end|>&#10;'}}{% elif message['role'] == 'assistant' %}{{'&lt;|assistant|>&#10;' + message['content'] + '&lt;|end|>&#10;'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '&lt;|assistant|>&#10;' }}{% else %}{{ eos_token }}{% endif %}" />
465
+ <eos_token_id value="32000" />
466
+ <original_tokenizer_class value="&lt;class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>" />
467
+ <pad_token_id value="32000" />
468
+ </rt_info>
469
+ </net>