BoDong commited on
Commit
9f940d7
1 Parent(s): 419289e

add the int8 onnx model and int8 Neural Engine IR

Browse files
int8-model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:215393cc42f7417de9fdd919cd9b32ba420f6258f3b9f78f6b2fb4f4948fcf80
3
+ size 35286277
sparse_int8_ir/conf.yaml ADDED
@@ -0,0 +1,1866 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ name: model
3
+ operator:
4
+ input_data:
5
+ type: Input
6
+ output:
7
+ input_ids:0:
8
+ dtype: int32
9
+ shape: [-1, -1]
10
+ token_type_ids:0:
11
+ dtype: int32
12
+ shape: [-1, -1]
13
+ attention_mask:0:
14
+ dtype: int32
15
+ shape: [-1, -1]
16
+ bert.embeddings.position_embeddings.weight:0:
17
+ dtype: fp32
18
+ shape: [512, 256]
19
+ location: [0, 524288]
20
+ bert.embeddings.token_type_embeddings.weight:0:
21
+ dtype: fp32
22
+ shape: [2, 256]
23
+ location: [524288, 2048]
24
+ bert.embeddings.word_embeddings.weight:0:
25
+ dtype: fp32
26
+ shape: [30522, 256]
27
+ location: [526336, 31254528]
28
+ bert.embeddings.LayerNorm.weight:0:
29
+ dtype: fp32
30
+ shape: [256]
31
+ location: [31780864, 1024]
32
+ bert.embeddings.LayerNorm.bias:0:
33
+ dtype: fp32
34
+ shape: [256]
35
+ location: [31781888, 1024]
36
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_min:
37
+ dtype: fp32
38
+ shape: [1]
39
+ location: [31988776, 4]
40
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_max:
41
+ dtype: fp32
42
+ shape: [1]
43
+ location: [31988780, 4]
44
+ /bert/encoder/layer.0/attention/self/key/Transpose_output_0_quantized:0:
45
+ dtype: s8
46
+ shape: [256, 256]
47
+ location: [31782920, 65536]
48
+ bert.encoder.layer.0.attention.self.key.bias:0:
49
+ dtype: s32
50
+ shape: [256]
51
+ location: [31848456, 1024]
52
+ /bert/encoder/layer.0/attention/self/key/Transpose_output_0_quantized:0_min:
53
+ dtype: fp32
54
+ shape: [256]
55
+ location: [31849480, 1024]
56
+ /bert/encoder/layer.0/attention/self/key/Transpose_output_0_quantized:0_max:
57
+ dtype: fp32
58
+ shape: [256]
59
+ location: [31850504, 1024]
60
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0_min:
61
+ dtype: fp32
62
+ shape: [1]
63
+ location: [31988800, 4]
64
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0_max:
65
+ dtype: fp32
66
+ shape: [1]
67
+ location: [31988804, 4]
68
+ /bert/encoder/layer.0/attention/self/query/Transpose_output_0_quantized:0:
69
+ dtype: s8
70
+ shape: [256, 256]
71
+ location: [31851544, 65536]
72
+ bert.encoder.layer.0.attention.self.query.bias:0:
73
+ dtype: s32
74
+ shape: [256]
75
+ location: [31917080, 1024]
76
+ /bert/encoder/layer.0/attention/self/query/Transpose_output_0_quantized:0_min:
77
+ dtype: fp32
78
+ shape: [256]
79
+ location: [31918104, 1024]
80
+ /bert/encoder/layer.0/attention/self/query/Transpose_output_0_quantized:0_max:
81
+ dtype: fp32
82
+ shape: [256]
83
+ location: [31919128, 1024]
84
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0_min:
85
+ dtype: fp32
86
+ shape: [1]
87
+ location: [31988792, 4]
88
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0_max:
89
+ dtype: fp32
90
+ shape: [1]
91
+ location: [31988796, 4]
92
+ /bert/encoder/layer.0/attention/self/value/Transpose_output_0_quantized:0:
93
+ dtype: s8
94
+ shape: [256, 256]
95
+ location: [31920168, 65536]
96
+ bert.encoder.layer.0.attention.self.value.bias:0:
97
+ dtype: s32
98
+ shape: [256]
99
+ location: [31985704, 1024]
100
+ /bert/encoder/layer.0/attention/self/value/Transpose_output_0_quantized:0_min:
101
+ dtype: fp32
102
+ shape: [256]
103
+ location: [31986728, 1024]
104
+ /bert/encoder/layer.0/attention/self/value/Transpose_output_0_quantized:0_max:
105
+ dtype: fp32
106
+ shape: [256]
107
+ location: [31987752, 1024]
108
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0_min:
109
+ dtype: fp32
110
+ shape: [1]
111
+ location: [31988832, 4]
112
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0_max:
113
+ dtype: fp32
114
+ shape: [1]
115
+ location: [31988836, 4]
116
+ /bert/encoder/layer.0/attention/self/Add_output_0:0_min:
117
+ dtype: fp32
118
+ shape: [1]
119
+ location: [31988808, 4]
120
+ /bert/encoder/layer.0/attention/self/Add_output_0:0_max:
121
+ dtype: fp32
122
+ shape: [1]
123
+ location: [31988812, 4]
124
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0_min:
125
+ dtype: fp32
126
+ shape: [1]
127
+ location: [31988824, 4]
128
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0_max:
129
+ dtype: fp32
130
+ shape: [1]
131
+ location: [31988828, 4]
132
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0_min:
133
+ dtype: fp32
134
+ shape: [1]
135
+ location: [32057456, 4]
136
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0_max:
137
+ dtype: fp32
138
+ shape: [1]
139
+ location: [32057460, 4]
140
+ /bert/encoder/layer.0/attention/output/dense/Transpose_output_0_quantized:0:
141
+ dtype: s8
142
+ shape: [256, 256]
143
+ location: [31988848, 65536]
144
+ bert.encoder.layer.0.attention.output.dense.bias:0:
145
+ dtype: s32
146
+ shape: [256]
147
+ location: [32054384, 1024]
148
+ /bert/encoder/layer.0/attention/output/dense/Transpose_output_0_quantized:0_min:
149
+ dtype: fp32
150
+ shape: [256]
151
+ location: [32055408, 1024]
152
+ /bert/encoder/layer.0/attention/output/dense/Transpose_output_0_quantized:0_max:
153
+ dtype: fp32
154
+ shape: [256]
155
+ location: [32056432, 1024]
156
+ /bert/encoder/layer.0/attention/output/Add_output_0:0_min:
157
+ dtype: fp32
158
+ shape: [1]
159
+ location: [32057464, 4]
160
+ /bert/encoder/layer.0/attention/output/Add_output_0:0_max:
161
+ dtype: fp32
162
+ shape: [1]
163
+ location: [32057468, 4]
164
+ bert.encoder.layer.0.attention.output.LayerNorm.weight:0:
165
+ dtype: fp32
166
+ shape: [256]
167
+ location: [32057472, 1024]
168
+ bert.encoder.layer.0.attention.output.LayerNorm.bias:0:
169
+ dtype: fp32
170
+ shape: [256]
171
+ location: [32058496, 1024]
172
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_min:
173
+ dtype: fp32
174
+ shape: [1]
175
+ location: [32333960, 4]
176
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_max:
177
+ dtype: fp32
178
+ shape: [1]
179
+ location: [32333964, 4]
180
+ /bert/encoder/layer.0/intermediate/dense/Transpose_output_0_quantized:0:
181
+ dtype: s8
182
+ shape: [1024, 256]
183
+ location: [32059528, 262144]
184
+ bert.encoder.layer.0.intermediate.dense.bias:0:
185
+ dtype: s32
186
+ shape: [1024]
187
+ location: [32321672, 4096]
188
+ /bert/encoder/layer.0/intermediate/dense/Transpose_output_0_quantized:0_min:
189
+ dtype: fp32
190
+ shape: [1024]
191
+ location: [32325768, 4096]
192
+ /bert/encoder/layer.0/intermediate/dense/Transpose_output_0_quantized:0_max:
193
+ dtype: fp32
194
+ shape: [1024]
195
+ location: [32329864, 4096]
196
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0_min:
197
+ dtype: fp32
198
+ shape: [1]
199
+ location: [32599192, 4]
200
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0_max:
201
+ dtype: fp32
202
+ shape: [1]
203
+ location: [32599196, 4]
204
+ /bert/encoder/layer.0/output/dense/Transpose_output_0_quantized:0:
205
+ dtype: s8
206
+ shape: [256, 1024]
207
+ location: [32333976, 262144]
208
+ bert.encoder.layer.0.output.dense.bias:0:
209
+ dtype: s32
210
+ shape: [256]
211
+ location: [32596120, 1024]
212
+ /bert/encoder/layer.0/output/dense/Transpose_output_0_quantized:0_min:
213
+ dtype: fp32
214
+ shape: [256]
215
+ location: [32597144, 1024]
216
+ /bert/encoder/layer.0/output/dense/Transpose_output_0_quantized:0_max:
217
+ dtype: fp32
218
+ shape: [256]
219
+ location: [32598168, 1024]
220
+ /bert/encoder/layer.0/output/Add_output_0:0_min:
221
+ dtype: fp32
222
+ shape: [1]
223
+ location: [32599200, 4]
224
+ /bert/encoder/layer.0/output/Add_output_0:0_max:
225
+ dtype: fp32
226
+ shape: [1]
227
+ location: [32599204, 4]
228
+ bert.encoder.layer.0.output.LayerNorm.weight:0:
229
+ dtype: fp32
230
+ shape: [256]
231
+ location: [32599208, 1024]
232
+ bert.encoder.layer.0.output.LayerNorm.bias:0:
233
+ dtype: fp32
234
+ shape: [256]
235
+ location: [32600232, 1024]
236
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_min:
237
+ dtype: fp32
238
+ shape: [1]
239
+ location: [32807120, 4]
240
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_max:
241
+ dtype: fp32
242
+ shape: [1]
243
+ location: [32807124, 4]
244
+ /bert/encoder/layer.1/attention/self/key/Transpose_output_0_quantized:0:
245
+ dtype: s8
246
+ shape: [256, 256]
247
+ location: [32601264, 65536]
248
+ bert.encoder.layer.1.attention.self.key.bias:0:
249
+ dtype: s32
250
+ shape: [256]
251
+ location: [32666800, 1024]
252
+ /bert/encoder/layer.1/attention/self/key/Transpose_output_0_quantized:0_min:
253
+ dtype: fp32
254
+ shape: [256]
255
+ location: [32667824, 1024]
256
+ /bert/encoder/layer.1/attention/self/key/Transpose_output_0_quantized:0_max:
257
+ dtype: fp32
258
+ shape: [256]
259
+ location: [32668848, 1024]
260
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0_min:
261
+ dtype: fp32
262
+ shape: [1]
263
+ location: [32807144, 4]
264
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0_max:
265
+ dtype: fp32
266
+ shape: [1]
267
+ location: [32807148, 4]
268
+ /bert/encoder/layer.1/attention/self/query/Transpose_output_0_quantized:0:
269
+ dtype: s8
270
+ shape: [256, 256]
271
+ location: [32669888, 65536]
272
+ bert.encoder.layer.1.attention.self.query.bias:0:
273
+ dtype: s32
274
+ shape: [256]
275
+ location: [32735424, 1024]
276
+ /bert/encoder/layer.1/attention/self/query/Transpose_output_0_quantized:0_min:
277
+ dtype: fp32
278
+ shape: [256]
279
+ location: [32736448, 1024]
280
+ /bert/encoder/layer.1/attention/self/query/Transpose_output_0_quantized:0_max:
281
+ dtype: fp32
282
+ shape: [256]
283
+ location: [32737472, 1024]
284
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0_min:
285
+ dtype: fp32
286
+ shape: [1]
287
+ location: [32807136, 4]
288
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0_max:
289
+ dtype: fp32
290
+ shape: [1]
291
+ location: [32807140, 4]
292
+ /bert/encoder/layer.1/attention/self/value/Transpose_output_0_quantized:0:
293
+ dtype: s8
294
+ shape: [256, 256]
295
+ location: [32738512, 65536]
296
+ bert.encoder.layer.1.attention.self.value.bias:0:
297
+ dtype: s32
298
+ shape: [256]
299
+ location: [32804048, 1024]
300
+ /bert/encoder/layer.1/attention/self/value/Transpose_output_0_quantized:0_min:
301
+ dtype: fp32
302
+ shape: [256]
303
+ location: [32805072, 1024]
304
+ /bert/encoder/layer.1/attention/self/value/Transpose_output_0_quantized:0_max:
305
+ dtype: fp32
306
+ shape: [256]
307
+ location: [32806096, 1024]
308
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0_min:
309
+ dtype: fp32
310
+ shape: [1]
311
+ location: [32807176, 4]
312
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0_max:
313
+ dtype: fp32
314
+ shape: [1]
315
+ location: [32807180, 4]
316
+ /bert/encoder/layer.1/attention/self/Add_output_0:0_min:
317
+ dtype: fp32
318
+ shape: [1]
319
+ location: [32807152, 4]
320
+ /bert/encoder/layer.1/attention/self/Add_output_0:0_max:
321
+ dtype: fp32
322
+ shape: [1]
323
+ location: [32807156, 4]
324
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0_min:
325
+ dtype: fp32
326
+ shape: [1]
327
+ location: [32807168, 4]
328
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0_max:
329
+ dtype: fp32
330
+ shape: [1]
331
+ location: [32807172, 4]
332
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0_min:
333
+ dtype: fp32
334
+ shape: [1]
335
+ location: [32875800, 4]
336
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0_max:
337
+ dtype: fp32
338
+ shape: [1]
339
+ location: [32875804, 4]
340
+ /bert/encoder/layer.1/attention/output/dense/Transpose_output_0_quantized:0:
341
+ dtype: s8
342
+ shape: [256, 256]
343
+ location: [32807192, 65536]
344
+ bert.encoder.layer.1.attention.output.dense.bias:0:
345
+ dtype: s32
346
+ shape: [256]
347
+ location: [32872728, 1024]
348
+ /bert/encoder/layer.1/attention/output/dense/Transpose_output_0_quantized:0_min:
349
+ dtype: fp32
350
+ shape: [256]
351
+ location: [32873752, 1024]
352
+ /bert/encoder/layer.1/attention/output/dense/Transpose_output_0_quantized:0_max:
353
+ dtype: fp32
354
+ shape: [256]
355
+ location: [32874776, 1024]
356
+ /bert/encoder/layer.1/attention/output/Add_output_0:0_min:
357
+ dtype: fp32
358
+ shape: [1]
359
+ location: [32875808, 4]
360
+ /bert/encoder/layer.1/attention/output/Add_output_0:0_max:
361
+ dtype: fp32
362
+ shape: [1]
363
+ location: [32875812, 4]
364
+ bert.encoder.layer.1.attention.output.LayerNorm.weight:0:
365
+ dtype: fp32
366
+ shape: [256]
367
+ location: [32875816, 1024]
368
+ bert.encoder.layer.1.attention.output.LayerNorm.bias:0:
369
+ dtype: fp32
370
+ shape: [256]
371
+ location: [32876840, 1024]
372
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_min:
373
+ dtype: fp32
374
+ shape: [1]
375
+ location: [33152304, 4]
376
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_max:
377
+ dtype: fp32
378
+ shape: [1]
379
+ location: [33152308, 4]
380
+ /bert/encoder/layer.1/intermediate/dense/Transpose_output_0_quantized:0:
381
+ dtype: s8
382
+ shape: [1024, 256]
383
+ location: [32877872, 262144]
384
+ bert.encoder.layer.1.intermediate.dense.bias:0:
385
+ dtype: s32
386
+ shape: [1024]
387
+ location: [33140016, 4096]
388
+ /bert/encoder/layer.1/intermediate/dense/Transpose_output_0_quantized:0_min:
389
+ dtype: fp32
390
+ shape: [1024]
391
+ location: [33144112, 4096]
392
+ /bert/encoder/layer.1/intermediate/dense/Transpose_output_0_quantized:0_max:
393
+ dtype: fp32
394
+ shape: [1024]
395
+ location: [33148208, 4096]
396
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0_min:
397
+ dtype: fp32
398
+ shape: [1]
399
+ location: [33417536, 4]
400
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0_max:
401
+ dtype: fp32
402
+ shape: [1]
403
+ location: [33417540, 4]
404
+ /bert/encoder/layer.1/output/dense/Transpose_output_0_quantized:0:
405
+ dtype: s8
406
+ shape: [256, 1024]
407
+ location: [33152320, 262144]
408
+ bert.encoder.layer.1.output.dense.bias:0:
409
+ dtype: s32
410
+ shape: [256]
411
+ location: [33414464, 1024]
412
+ /bert/encoder/layer.1/output/dense/Transpose_output_0_quantized:0_min:
413
+ dtype: fp32
414
+ shape: [256]
415
+ location: [33415488, 1024]
416
+ /bert/encoder/layer.1/output/dense/Transpose_output_0_quantized:0_max:
417
+ dtype: fp32
418
+ shape: [256]
419
+ location: [33416512, 1024]
420
+ /bert/encoder/layer.1/output/Add_output_0:0_min:
421
+ dtype: fp32
422
+ shape: [1]
423
+ location: [33417544, 4]
424
+ /bert/encoder/layer.1/output/Add_output_0:0_max:
425
+ dtype: fp32
426
+ shape: [1]
427
+ location: [33417548, 4]
428
+ bert.encoder.layer.1.output.LayerNorm.weight:0:
429
+ dtype: fp32
430
+ shape: [256]
431
+ location: [33417552, 1024]
432
+ bert.encoder.layer.1.output.LayerNorm.bias:0:
433
+ dtype: fp32
434
+ shape: [256]
435
+ location: [33418576, 1024]
436
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_min:
437
+ dtype: fp32
438
+ shape: [1]
439
+ location: [33625464, 4]
440
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_max:
441
+ dtype: fp32
442
+ shape: [1]
443
+ location: [33625468, 4]
444
+ /bert/encoder/layer.2/attention/self/key/Transpose_output_0_quantized:0:
445
+ dtype: s8
446
+ shape: [256, 256]
447
+ location: [33419608, 65536]
448
+ bert.encoder.layer.2.attention.self.key.bias:0:
449
+ dtype: s32
450
+ shape: [256]
451
+ location: [33485144, 1024]
452
+ /bert/encoder/layer.2/attention/self/key/Transpose_output_0_quantized:0_min:
453
+ dtype: fp32
454
+ shape: [256]
455
+ location: [33486168, 1024]
456
+ /bert/encoder/layer.2/attention/self/key/Transpose_output_0_quantized:0_max:
457
+ dtype: fp32
458
+ shape: [256]
459
+ location: [33487192, 1024]
460
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0_min:
461
+ dtype: fp32
462
+ shape: [1]
463
+ location: [33625488, 4]
464
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0_max:
465
+ dtype: fp32
466
+ shape: [1]
467
+ location: [33625492, 4]
468
+ /bert/encoder/layer.2/attention/self/query/Transpose_output_0_quantized:0:
469
+ dtype: s8
470
+ shape: [256, 256]
471
+ location: [33488232, 65536]
472
+ bert.encoder.layer.2.attention.self.query.bias:0:
473
+ dtype: s32
474
+ shape: [256]
475
+ location: [33553768, 1024]
476
+ /bert/encoder/layer.2/attention/self/query/Transpose_output_0_quantized:0_min:
477
+ dtype: fp32
478
+ shape: [256]
479
+ location: [33554792, 1024]
480
+ /bert/encoder/layer.2/attention/self/query/Transpose_output_0_quantized:0_max:
481
+ dtype: fp32
482
+ shape: [256]
483
+ location: [33555816, 1024]
484
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0_min:
485
+ dtype: fp32
486
+ shape: [1]
487
+ location: [33625480, 4]
488
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0_max:
489
+ dtype: fp32
490
+ shape: [1]
491
+ location: [33625484, 4]
492
+ /bert/encoder/layer.2/attention/self/value/Transpose_output_0_quantized:0:
493
+ dtype: s8
494
+ shape: [256, 256]
495
+ location: [33556856, 65536]
496
+ bert.encoder.layer.2.attention.self.value.bias:0:
497
+ dtype: s32
498
+ shape: [256]
499
+ location: [33622392, 1024]
500
+ /bert/encoder/layer.2/attention/self/value/Transpose_output_0_quantized:0_min:
501
+ dtype: fp32
502
+ shape: [256]
503
+ location: [33623416, 1024]
504
+ /bert/encoder/layer.2/attention/self/value/Transpose_output_0_quantized:0_max:
505
+ dtype: fp32
506
+ shape: [256]
507
+ location: [33624440, 1024]
508
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0_min:
509
+ dtype: fp32
510
+ shape: [1]
511
+ location: [33625520, 4]
512
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0_max:
513
+ dtype: fp32
514
+ shape: [1]
515
+ location: [33625524, 4]
516
+ /bert/encoder/layer.2/attention/self/Add_output_0:0_min:
517
+ dtype: fp32
518
+ shape: [1]
519
+ location: [33625496, 4]
520
+ /bert/encoder/layer.2/attention/self/Add_output_0:0_max:
521
+ dtype: fp32
522
+ shape: [1]
523
+ location: [33625500, 4]
524
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0_min:
525
+ dtype: fp32
526
+ shape: [1]
527
+ location: [33625512, 4]
528
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0_max:
529
+ dtype: fp32
530
+ shape: [1]
531
+ location: [33625516, 4]
532
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0_min:
533
+ dtype: fp32
534
+ shape: [1]
535
+ location: [33694144, 4]
536
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0_max:
537
+ dtype: fp32
538
+ shape: [1]
539
+ location: [33694148, 4]
540
+ /bert/encoder/layer.2/attention/output/dense/Transpose_output_0_quantized:0:
541
+ dtype: s8
542
+ shape: [256, 256]
543
+ location: [33625536, 65536]
544
+ bert.encoder.layer.2.attention.output.dense.bias:0:
545
+ dtype: s32
546
+ shape: [256]
547
+ location: [33691072, 1024]
548
+ /bert/encoder/layer.2/attention/output/dense/Transpose_output_0_quantized:0_min:
549
+ dtype: fp32
550
+ shape: [256]
551
+ location: [33692096, 1024]
552
+ /bert/encoder/layer.2/attention/output/dense/Transpose_output_0_quantized:0_max:
553
+ dtype: fp32
554
+ shape: [256]
555
+ location: [33693120, 1024]
556
+ /bert/encoder/layer.2/attention/output/Add_output_0:0_min:
557
+ dtype: fp32
558
+ shape: [1]
559
+ location: [33694152, 4]
560
+ /bert/encoder/layer.2/attention/output/Add_output_0:0_max:
561
+ dtype: fp32
562
+ shape: [1]
563
+ location: [33694156, 4]
564
+ bert.encoder.layer.2.attention.output.LayerNorm.weight:0:
565
+ dtype: fp32
566
+ shape: [256]
567
+ location: [33694160, 1024]
568
+ bert.encoder.layer.2.attention.output.LayerNorm.bias:0:
569
+ dtype: fp32
570
+ shape: [256]
571
+ location: [33695184, 1024]
572
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_min:
573
+ dtype: fp32
574
+ shape: [1]
575
+ location: [33970648, 4]
576
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_max:
577
+ dtype: fp32
578
+ shape: [1]
579
+ location: [33970652, 4]
580
+ /bert/encoder/layer.2/intermediate/dense/Transpose_output_0_quantized:0:
581
+ dtype: s8
582
+ shape: [1024, 256]
583
+ location: [33696216, 262144]
584
+ bert.encoder.layer.2.intermediate.dense.bias:0:
585
+ dtype: s32
586
+ shape: [1024]
587
+ location: [33958360, 4096]
588
+ /bert/encoder/layer.2/intermediate/dense/Transpose_output_0_quantized:0_min:
589
+ dtype: fp32
590
+ shape: [1024]
591
+ location: [33962456, 4096]
592
+ /bert/encoder/layer.2/intermediate/dense/Transpose_output_0_quantized:0_max:
593
+ dtype: fp32
594
+ shape: [1024]
595
+ location: [33966552, 4096]
596
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0_min:
597
+ dtype: fp32
598
+ shape: [1]
599
+ location: [34235880, 4]
600
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0_max:
601
+ dtype: fp32
602
+ shape: [1]
603
+ location: [34235884, 4]
604
+ /bert/encoder/layer.2/output/dense/Transpose_output_0_quantized:0:
605
+ dtype: s8
606
+ shape: [256, 1024]
607
+ location: [33970664, 262144]
608
+ bert.encoder.layer.2.output.dense.bias:0:
609
+ dtype: s32
610
+ shape: [256]
611
+ location: [34232808, 1024]
612
+ /bert/encoder/layer.2/output/dense/Transpose_output_0_quantized:0_min:
613
+ dtype: fp32
614
+ shape: [256]
615
+ location: [34233832, 1024]
616
+ /bert/encoder/layer.2/output/dense/Transpose_output_0_quantized:0_max:
617
+ dtype: fp32
618
+ shape: [256]
619
+ location: [34234856, 1024]
620
+ /bert/encoder/layer.2/output/Add_output_0:0_min:
621
+ dtype: fp32
622
+ shape: [1]
623
+ location: [34235888, 4]
624
+ /bert/encoder/layer.2/output/Add_output_0:0_max:
625
+ dtype: fp32
626
+ shape: [1]
627
+ location: [34235892, 4]
628
+ bert.encoder.layer.2.output.LayerNorm.weight:0:
629
+ dtype: fp32
630
+ shape: [256]
631
+ location: [34235896, 1024]
632
+ bert.encoder.layer.2.output.LayerNorm.bias:0:
633
+ dtype: fp32
634
+ shape: [256]
635
+ location: [34236920, 1024]
636
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_min:
637
+ dtype: fp32
638
+ shape: [1]
639
+ location: [34443808, 4]
640
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_max:
641
+ dtype: fp32
642
+ shape: [1]
643
+ location: [34443812, 4]
644
+ /bert/encoder/layer.3/attention/self/key/Transpose_output_0_quantized:0:
645
+ dtype: s8
646
+ shape: [256, 256]
647
+ location: [34237952, 65536]
648
+ bert.encoder.layer.3.attention.self.key.bias:0:
649
+ dtype: s32
650
+ shape: [256]
651
+ location: [34303488, 1024]
652
+ /bert/encoder/layer.3/attention/self/key/Transpose_output_0_quantized:0_min:
653
+ dtype: fp32
654
+ shape: [256]
655
+ location: [34304512, 1024]
656
+ /bert/encoder/layer.3/attention/self/key/Transpose_output_0_quantized:0_max:
657
+ dtype: fp32
658
+ shape: [256]
659
+ location: [34305536, 1024]
660
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0_min:
661
+ dtype: fp32
662
+ shape: [1]
663
+ location: [34443832, 4]
664
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0_max:
665
+ dtype: fp32
666
+ shape: [1]
667
+ location: [34443836, 4]
668
+ /bert/encoder/layer.3/attention/self/query/Transpose_output_0_quantized:0:
669
+ dtype: s8
670
+ shape: [256, 256]
671
+ location: [34306576, 65536]
672
+ bert.encoder.layer.3.attention.self.query.bias:0:
673
+ dtype: s32
674
+ shape: [256]
675
+ location: [34372112, 1024]
676
+ /bert/encoder/layer.3/attention/self/query/Transpose_output_0_quantized:0_min:
677
+ dtype: fp32
678
+ shape: [256]
679
+ location: [34373136, 1024]
680
+ /bert/encoder/layer.3/attention/self/query/Transpose_output_0_quantized:0_max:
681
+ dtype: fp32
682
+ shape: [256]
683
+ location: [34374160, 1024]
684
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0_min:
685
+ dtype: fp32
686
+ shape: [1]
687
+ location: [34443824, 4]
688
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0_max:
689
+ dtype: fp32
690
+ shape: [1]
691
+ location: [34443828, 4]
692
+ /bert/encoder/layer.3/attention/self/value/Transpose_output_0_quantized:0:
693
+ dtype: s8
694
+ shape: [256, 256]
695
+ location: [34375200, 65536]
696
+ bert.encoder.layer.3.attention.self.value.bias:0:
697
+ dtype: s32
698
+ shape: [256]
699
+ location: [34440736, 1024]
700
+ /bert/encoder/layer.3/attention/self/value/Transpose_output_0_quantized:0_min:
701
+ dtype: fp32
702
+ shape: [256]
703
+ location: [34441760, 1024]
704
+ /bert/encoder/layer.3/attention/self/value/Transpose_output_0_quantized:0_max:
705
+ dtype: fp32
706
+ shape: [256]
707
+ location: [34442784, 1024]
708
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0_min:
709
+ dtype: fp32
710
+ shape: [1]
711
+ location: [34443864, 4]
712
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0_max:
713
+ dtype: fp32
714
+ shape: [1]
715
+ location: [34443868, 4]
716
+ /bert/encoder/layer.3/attention/self/Add_output_0:0_min:
717
+ dtype: fp32
718
+ shape: [1]
719
+ location: [34443840, 4]
720
+ /bert/encoder/layer.3/attention/self/Add_output_0:0_max:
721
+ dtype: fp32
722
+ shape: [1]
723
+ location: [34443844, 4]
724
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0_min:
725
+ dtype: fp32
726
+ shape: [1]
727
+ location: [34443856, 4]
728
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0_max:
729
+ dtype: fp32
730
+ shape: [1]
731
+ location: [34443860, 4]
732
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0_min:
733
+ dtype: fp32
734
+ shape: [1]
735
+ location: [34512488, 4]
736
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0_max:
737
+ dtype: fp32
738
+ shape: [1]
739
+ location: [34512492, 4]
740
+ /bert/encoder/layer.3/attention/output/dense/Transpose_output_0_quantized:0:
741
+ dtype: s8
742
+ shape: [256, 256]
743
+ location: [34443880, 65536]
744
+ bert.encoder.layer.3.attention.output.dense.bias:0:
745
+ dtype: s32
746
+ shape: [256]
747
+ location: [34509416, 1024]
748
+ /bert/encoder/layer.3/attention/output/dense/Transpose_output_0_quantized:0_min:
749
+ dtype: fp32
750
+ shape: [256]
751
+ location: [34510440, 1024]
752
+ /bert/encoder/layer.3/attention/output/dense/Transpose_output_0_quantized:0_max:
753
+ dtype: fp32
754
+ shape: [256]
755
+ location: [34511464, 1024]
756
+ /bert/encoder/layer.3/attention/output/Add_output_0:0_min:
757
+ dtype: fp32
758
+ shape: [1]
759
+ location: [34512496, 4]
760
+ /bert/encoder/layer.3/attention/output/Add_output_0:0_max:
761
+ dtype: fp32
762
+ shape: [1]
763
+ location: [34512500, 4]
764
+ bert.encoder.layer.3.attention.output.LayerNorm.weight:0:
765
+ dtype: fp32
766
+ shape: [256]
767
+ location: [34512504, 1024]
768
+ bert.encoder.layer.3.attention.output.LayerNorm.bias:0:
769
+ dtype: fp32
770
+ shape: [256]
771
+ location: [34513528, 1024]
772
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_min:
773
+ dtype: fp32
774
+ shape: [1]
775
+ location: [34788992, 4]
776
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_max:
777
+ dtype: fp32
778
+ shape: [1]
779
+ location: [34788996, 4]
780
+ /bert/encoder/layer.3/intermediate/dense/Transpose_output_0_quantized:0:
781
+ dtype: s8
782
+ shape: [1024, 256]
783
+ location: [34514560, 262144]
784
+ bert.encoder.layer.3.intermediate.dense.bias:0:
785
+ dtype: s32
786
+ shape: [1024]
787
+ location: [34776704, 4096]
788
+ /bert/encoder/layer.3/intermediate/dense/Transpose_output_0_quantized:0_min:
789
+ dtype: fp32
790
+ shape: [1024]
791
+ location: [34780800, 4096]
792
+ /bert/encoder/layer.3/intermediate/dense/Transpose_output_0_quantized:0_max:
793
+ dtype: fp32
794
+ shape: [1024]
795
+ location: [34784896, 4096]
796
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0_min:
797
+ dtype: fp32
798
+ shape: [1]
799
+ location: [35054224, 4]
800
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0_max:
801
+ dtype: fp32
802
+ shape: [1]
803
+ location: [35054228, 4]
804
+ /bert/encoder/layer.3/output/dense/Transpose_output_0_quantized:0:
805
+ dtype: s8
806
+ shape: [256, 1024]
807
+ location: [34789008, 262144]
808
+ bert.encoder.layer.3.output.dense.bias:0:
809
+ dtype: s32
810
+ shape: [256]
811
+ location: [35051152, 1024]
812
+ /bert/encoder/layer.3/output/dense/Transpose_output_0_quantized:0_min:
813
+ dtype: fp32
814
+ shape: [256]
815
+ location: [35052176, 1024]
816
+ /bert/encoder/layer.3/output/dense/Transpose_output_0_quantized:0_max:
817
+ dtype: fp32
818
+ shape: [256]
819
+ location: [35053200, 1024]
820
+ /bert/encoder/layer.3/output/Add_output_0:0_min:
821
+ dtype: fp32
822
+ shape: [1]
823
+ location: [35054232, 4]
824
+ /bert/encoder/layer.3/output/Add_output_0:0_max:
825
+ dtype: fp32
826
+ shape: [1]
827
+ location: [35054236, 4]
828
+ bert.encoder.layer.3.output.LayerNorm.weight:0:
829
+ dtype: fp32
830
+ shape: [256]
831
+ location: [35054240, 1024]
832
+ bert.encoder.layer.3.output.LayerNorm.bias:0:
833
+ dtype: fp32
834
+ shape: [256]
835
+ location: [35055264, 1024]
836
+ /bert/pooler/Gather_output_0:0_min:
837
+ dtype: fp32
838
+ shape: [1]
839
+ location: [35122856, 4]
840
+ /bert/pooler/Gather_output_0:0_max:
841
+ dtype: fp32
842
+ shape: [1]
843
+ location: [35122860, 4]
844
+ bert.pooler.dense.weight_quantized:0:
845
+ dtype: s8
846
+ shape: [256, 256]
847
+ location: [35056296, 65536]
848
+ bert.pooler.dense.bias:0:
849
+ dtype: s32
850
+ shape: [256]
851
+ location: [35121832, 1024]
852
+ bert.pooler.dense.weight_quantized:0_min:
853
+ dtype: fp32
854
+ shape: [256]
855
+ location: [35122864, 1024]
856
+ bert.pooler.dense.weight_quantized:0_max:
857
+ dtype: fp32
858
+ shape: [256]
859
+ location: [35123888, 1024]
860
+ /bert/pooler/activation/Tanh_output_0:0_min:
861
+ dtype: fp32
862
+ shape: [1]
863
+ location: [35125440, 4]
864
+ /bert/pooler/activation/Tanh_output_0:0_max:
865
+ dtype: fp32
866
+ shape: [1]
867
+ location: [35125444, 4]
868
+ classifier.weight_quantized:0:
869
+ dtype: s8
870
+ shape: [256, 2]
871
+ location: [35124920, 512]
872
+ classifier.bias:0:
873
+ dtype: s32
874
+ shape: [2]
875
+ location: [35125432, 8]
876
+ classifier.weight_quantized:0_min:
877
+ dtype: fp32
878
+ shape: [2]
879
+ location: [35125448, 8]
880
+ classifier.weight_quantized:0_max:
881
+ dtype: fp32
882
+ shape: [2]
883
+ location: [35125456, 8]
884
+ 609:0_min:
885
+ dtype: fp32
886
+ shape: [1]
887
+ location: [35125464, 4]
888
+ 609:0_max:
889
+ dtype: fp32
890
+ shape: [1]
891
+ location: [35125468, 4]
892
+ position_embeddings/after/reshape:
893
+ type: Reshape
894
+ input:
895
+ bert.embeddings.position_embeddings.weight:0: {}
896
+ input_ids:0: {}
897
+ output:
898
+ position_embeddings/after/reshape:0: {}
899
+ attr:
900
+ dst_shape: 1,-1,256
901
+ dims: 1
902
+ /bert/embeddings/position_embeddings/Gather:
903
+ type: Reshape
904
+ input:
905
+ position_embeddings/after/reshape:0: {}
906
+ output:
907
+ /bert/embeddings/position_embeddings/Gather_output_0:0: {}
908
+ attr:
909
+ dst_shape: 1,-1
910
+ /bert/Mul:
911
+ type: PaddingSequence
912
+ input:
913
+ attention_mask:0: {}
914
+ output:
915
+ /bert/Mul_output_0:0: {}
916
+ attr:
917
+ dst_shape: -1,4,0,-1
918
+ dims: 1
919
+ word_embeddings/reshape:
920
+ type: Reshape
921
+ input:
922
+ input_ids:0: {}
923
+ output:
924
+ word_embeddings/reshape:0: {}
925
+ attr:
926
+ dst_shape: -1
927
+ token_type_embeddings/reshape:
928
+ type: Reshape
929
+ input:
930
+ token_type_ids:0: {}
931
+ output:
932
+ token_type_embeddings/reshape:0: {}
933
+ attr:
934
+ dst_shape: -1
935
+ /bert/embeddings/token_type_embeddings/Gather:
936
+ type: Gather
937
+ input:
938
+ token_type_embeddings/reshape:0: {}
939
+ bert.embeddings.token_type_embeddings.weight:0: {}
940
+ /bert/embeddings/position_embeddings/Gather_output_0:0: {}
941
+ token_type_ids:0: {}
942
+ output:
943
+ /bert/embeddings/token_type_embeddings/Gather:0: {}
944
+ attr:
945
+ axis: 0
946
+ batch_dims: 0
947
+ append_op: binary_add
948
+ reshape: -1,-1,256
949
+ reshape_dims: 0,1
950
+ mul: 1,2
951
+ /bert/embeddings/word_embeddings/Gather:
952
+ type: Gather
953
+ input:
954
+ word_embeddings/reshape:0: {}
955
+ bert.embeddings.word_embeddings.weight:0: {}
956
+ /bert/embeddings/token_type_embeddings/Gather:0: {}
957
+ token_type_ids:0: {}
958
+ output:
959
+ embeddings_add/reshape_2d:0: {}
960
+ attr:
961
+ axis: 0
962
+ batch_dims: 0
963
+ append_op: binary_add
964
+ reshape: -1,-1,256
965
+ reshape_dims: 0,1
966
+ mul: 1,2
967
+ /bert/embeddings/LayerNorm/Add_1:
968
+ type: LayerNorm
969
+ input:
970
+ embeddings_add/reshape_2d:0: {}
971
+ bert.embeddings.LayerNorm.weight:0: {}
972
+ bert.embeddings.LayerNorm.bias:0: {}
973
+ output:
974
+ /bert/embeddings/LayerNorm/Add_1_output_0:0: {}
975
+ attr:
976
+ epsilon: 9.999999960041972e-13
977
+ /bert/encoder/layer.0/attention/self/key/Add_quant_0_Reorder_Post_0:
978
+ type: Reorder
979
+ input:
980
+ /bert/embeddings/LayerNorm/Add_1_output_0:0: {}
981
+ output:
982
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_reorder: {}
983
+ attr:
984
+ src_perm: 0,1
985
+ dst_perm: 1,0
986
+ /bert/encoder/layer.0/attention/self/key/Add_quant_0:
987
+ type: Quantize
988
+ input:
989
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_reorder: {}
990
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_min: {}
991
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_max: {}
992
+ output:
993
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_quant: {}
994
+ attr:
995
+ output_dtype: u8
996
+ /bert/encoder/layer.0/attention/self/key/Add:
997
+ type: InnerProduct
998
+ input:
999
+ /bert/encoder/layer.0/attention/self/key/Transpose_output_0_quantized:0: {}
1000
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_quant: {}
1001
+ bert.encoder.layer.0.attention.self.key.bias:0: {}
1002
+ /bert/encoder/layer.0/attention/self/key/Transpose_output_0_quantized:0_min: {}
1003
+ /bert/encoder/layer.0/attention/self/key/Transpose_output_0_quantized:0_max: {}
1004
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_min: {}
1005
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_max: {}
1006
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0_min: {}
1007
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0_max: {}
1008
+ input_ids:0: {}
1009
+ output:
1010
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0: {}
1011
+ attr:
1012
+ output_dtype: s8
1013
+ reshape: 4,64,-1, -1
1014
+ reshape_dims: '0'
1015
+ /bert/encoder/layer.0/attention/self/query/Add:
1016
+ type: InnerProduct
1017
+ input:
1018
+ /bert/encoder/layer.0/attention/self/query/Transpose_output_0_quantized:0: {}
1019
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_quant: {}
1020
+ bert.encoder.layer.0.attention.self.query.bias:0: {}
1021
+ /bert/encoder/layer.0/attention/self/query/Transpose_output_0_quantized:0_min: {}
1022
+ /bert/encoder/layer.0/attention/self/query/Transpose_output_0_quantized:0_max: {}
1023
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_min: {}
1024
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_max: {}
1025
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0_min: {}
1026
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0_max: {}
1027
+ input_ids:0: {}
1028
+ output:
1029
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0: {}
1030
+ attr:
1031
+ output_dtype: s8
1032
+ reshape: 4,64,-1, -1
1033
+ reshape_dims: '0'
1034
+ /bert/encoder/layer.0/attention/self/value/Add:
1035
+ type: InnerProduct
1036
+ input:
1037
+ /bert/encoder/layer.0/attention/self/value/Transpose_output_0_quantized:0: {}
1038
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_quant: {}
1039
+ bert.encoder.layer.0.attention.self.value.bias:0: {}
1040
+ /bert/encoder/layer.0/attention/self/value/Transpose_output_0_quantized:0_min: {}
1041
+ /bert/encoder/layer.0/attention/self/value/Transpose_output_0_quantized:0_max: {}
1042
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_min: {}
1043
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_max: {}
1044
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0_min: {}
1045
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0_max: {}
1046
+ input_ids:0: {}
1047
+ output:
1048
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0: {}
1049
+ attr:
1050
+ output_dtype: s8
1051
+ reshape: 4,64,-1, -1
1052
+ reshape_dims: '0'
1053
+ /bert/encoder/layer.0/attention/self/Add:
1054
+ type: Matmul
1055
+ input:
1056
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0: {}
1057
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0: {}
1058
+ /bert/Mul_output_0:0: {}
1059
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0_min: {}
1060
+ /bert/encoder/layer.0/attention/self/Reshape_2_output_0:0_max: {}
1061
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0_min: {}
1062
+ /bert/encoder/layer.0/attention/self/Reshape_output_0:0_max: {}
1063
+ /bert/encoder/layer.0/attention/self/Add_output_0:0_min: {}
1064
+ /bert/encoder/layer.0/attention/self/Add_output_0:0_max: {}
1065
+ output:
1066
+ /bert/encoder/layer.0/attention/self/Add_output_0:0: {}
1067
+ attr:
1068
+ src0_perm: 2,0,3,1
1069
+ src1_perm: 2,0,1,3
1070
+ output_scale: 0.125
1071
+ format_any: false
1072
+ append_op: binary_add
1073
+ /bert/encoder/layer.0/attention/self/Softmax:
1074
+ type: Softmax
1075
+ input:
1076
+ /bert/encoder/layer.0/attention/self/Add_output_0:0: {}
1077
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0_min: {}
1078
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0_max: {}
1079
+ output:
1080
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0: {}
1081
+ attr:
1082
+ output_dtype: u8
1083
+ /bert/encoder/layer.0/attention/self/Transpose_3:
1084
+ type: Matmul
1085
+ input:
1086
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0: {}
1087
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0: {}
1088
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0_min: {}
1089
+ /bert/encoder/layer.0/attention/self/Softmax_output_0:0_max: {}
1090
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0_min: {}
1091
+ /bert/encoder/layer.0/attention/self/Reshape_1_output_0:0_max: {}
1092
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0_min: {}
1093
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0_max: {}
1094
+ output:
1095
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0: {}
1096
+ attr:
1097
+ src1_perm: 2,0,3,1
1098
+ dst_perm: 1,3,0,2
1099
+ output_dtype: u8
1100
+ reshape: 256,-1
1101
+ /bert/encoder/layer.0/attention/output/Add:
1102
+ type: InnerProduct
1103
+ input:
1104
+ /bert/encoder/layer.0/attention/output/dense/Transpose_output_0_quantized:0: {}
1105
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0: {}
1106
+ bert.encoder.layer.0.attention.output.dense.bias:0: {}
1107
+ /bert/embeddings/LayerNorm/Add_1_output_0:0_reorder: {}
1108
+ /bert/encoder/layer.0/attention/output/dense/Transpose_output_0_quantized:0_min: {}
1109
+ /bert/encoder/layer.0/attention/output/dense/Transpose_output_0_quantized:0_max: {}
1110
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0_min: {}
1111
+ /bert/encoder/layer.0/attention/self/Reshape_3_output_0:0_max: {}
1112
+ /bert/encoder/layer.0/attention/output/Add_output_0:0_min: {}
1113
+ /bert/encoder/layer.0/attention/output/Add_output_0:0_max: {}
1114
+ output:
1115
+ /bert/encoder/layer.0/attention/output/Add_output_0:0: {}
1116
+ attr:
1117
+ append_op: sum
1118
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1:
1119
+ type: LayerNorm
1120
+ input:
1121
+ /bert/encoder/layer.0/attention/output/Add_output_0:0: {}
1122
+ bert.encoder.layer.0.attention.output.LayerNorm.weight:0: {}
1123
+ bert.encoder.layer.0.attention.output.LayerNorm.bias:0: {}
1124
+ output:
1125
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0: {}
1126
+ attr:
1127
+ epsilon: 9.999999960041972e-13
1128
+ transpose_mode: 1, 0
1129
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_quant_0:
1130
+ type: Quantize
1131
+ input:
1132
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0: {}
1133
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1134
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1135
+ output:
1136
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1137
+ attr:
1138
+ output_dtype: u8
1139
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1:
1140
+ type: InnerProduct
1141
+ input:
1142
+ /bert/encoder/layer.0/intermediate/dense/Transpose_output_0_quantized:0: {}
1143
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1144
+ bert.encoder.layer.0.intermediate.dense.bias:0: {}
1145
+ /bert/encoder/layer.0/intermediate/dense/Transpose_output_0_quantized:0_min: {}
1146
+ /bert/encoder/layer.0/intermediate/dense/Transpose_output_0_quantized:0_max: {}
1147
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1148
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1149
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1150
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1151
+ output:
1152
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1153
+ attr:
1154
+ append_op: gelu_tanh
1155
+ output_dtype: u8
1156
+ /bert/encoder/layer.0/output/Add:
1157
+ type: InnerProduct
1158
+ input:
1159
+ /bert/encoder/layer.0/output/dense/Transpose_output_0_quantized:0: {}
1160
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1161
+ bert.encoder.layer.0.output.dense.bias:0: {}
1162
+ /bert/encoder/layer.0/attention/output/LayerNorm/Add_1_output_0:0: {}
1163
+ /bert/encoder/layer.0/output/dense/Transpose_output_0_quantized:0_min: {}
1164
+ /bert/encoder/layer.0/output/dense/Transpose_output_0_quantized:0_max: {}
1165
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1166
+ /bert/encoder/layer.0/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1167
+ /bert/encoder/layer.0/output/Add_output_0:0_min: {}
1168
+ /bert/encoder/layer.0/output/Add_output_0:0_max: {}
1169
+ output:
1170
+ /bert/encoder/layer.0/output/Add_output_0:0: {}
1171
+ attr:
1172
+ append_op: sum
1173
+ /bert/encoder/layer.0/output/LayerNorm/Add_1:
1174
+ type: LayerNorm
1175
+ input:
1176
+ /bert/encoder/layer.0/output/Add_output_0:0: {}
1177
+ bert.encoder.layer.0.output.LayerNorm.weight:0: {}
1178
+ bert.encoder.layer.0.output.LayerNorm.bias:0: {}
1179
+ output:
1180
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0: {}
1181
+ attr:
1182
+ epsilon: 9.999999960041972e-13
1183
+ transpose_mode: 1, 0
1184
+ /bert/encoder/layer.1/attention/self/key/Add_quant_0:
1185
+ type: Quantize
1186
+ input:
1187
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0: {}
1188
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_min: {}
1189
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_max: {}
1190
+ output:
1191
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_quant: {}
1192
+ attr:
1193
+ output_dtype: u8
1194
+ /bert/encoder/layer.1/attention/self/key/Add:
1195
+ type: InnerProduct
1196
+ input:
1197
+ /bert/encoder/layer.1/attention/self/key/Transpose_output_0_quantized:0: {}
1198
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_quant: {}
1199
+ bert.encoder.layer.1.attention.self.key.bias:0: {}
1200
+ /bert/encoder/layer.1/attention/self/key/Transpose_output_0_quantized:0_min: {}
1201
+ /bert/encoder/layer.1/attention/self/key/Transpose_output_0_quantized:0_max: {}
1202
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_min: {}
1203
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_max: {}
1204
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0_min: {}
1205
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0_max: {}
1206
+ input_ids:0: {}
1207
+ output:
1208
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0: {}
1209
+ attr:
1210
+ output_dtype: s8
1211
+ reshape: 4,64,-1, -1
1212
+ reshape_dims: '0'
1213
+ /bert/encoder/layer.1/attention/self/query/Add:
1214
+ type: InnerProduct
1215
+ input:
1216
+ /bert/encoder/layer.1/attention/self/query/Transpose_output_0_quantized:0: {}
1217
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_quant: {}
1218
+ bert.encoder.layer.1.attention.self.query.bias:0: {}
1219
+ /bert/encoder/layer.1/attention/self/query/Transpose_output_0_quantized:0_min: {}
1220
+ /bert/encoder/layer.1/attention/self/query/Transpose_output_0_quantized:0_max: {}
1221
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_min: {}
1222
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_max: {}
1223
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0_min: {}
1224
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0_max: {}
1225
+ input_ids:0: {}
1226
+ output:
1227
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0: {}
1228
+ attr:
1229
+ output_dtype: s8
1230
+ reshape: 4,64,-1, -1
1231
+ reshape_dims: '0'
1232
+ /bert/encoder/layer.1/attention/self/value/Add:
1233
+ type: InnerProduct
1234
+ input:
1235
+ /bert/encoder/layer.1/attention/self/value/Transpose_output_0_quantized:0: {}
1236
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_quant: {}
1237
+ bert.encoder.layer.1.attention.self.value.bias:0: {}
1238
+ /bert/encoder/layer.1/attention/self/value/Transpose_output_0_quantized:0_min: {}
1239
+ /bert/encoder/layer.1/attention/self/value/Transpose_output_0_quantized:0_max: {}
1240
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_min: {}
1241
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0_max: {}
1242
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0_min: {}
1243
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0_max: {}
1244
+ input_ids:0: {}
1245
+ output:
1246
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0: {}
1247
+ attr:
1248
+ output_dtype: s8
1249
+ reshape: 4,64,-1, -1
1250
+ reshape_dims: '0'
1251
+ /bert/encoder/layer.1/attention/self/Add:
1252
+ type: Matmul
1253
+ input:
1254
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0: {}
1255
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0: {}
1256
+ /bert/Mul_output_0:0: {}
1257
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0_min: {}
1258
+ /bert/encoder/layer.1/attention/self/Reshape_2_output_0:0_max: {}
1259
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0_min: {}
1260
+ /bert/encoder/layer.1/attention/self/Reshape_output_0:0_max: {}
1261
+ /bert/encoder/layer.1/attention/self/Add_output_0:0_min: {}
1262
+ /bert/encoder/layer.1/attention/self/Add_output_0:0_max: {}
1263
+ output:
1264
+ /bert/encoder/layer.1/attention/self/Add_output_0:0: {}
1265
+ attr:
1266
+ src0_perm: 2,0,3,1
1267
+ src1_perm: 2,0,1,3
1268
+ output_scale: 0.125
1269
+ format_any: false
1270
+ append_op: binary_add
1271
+ /bert/encoder/layer.1/attention/self/Softmax:
1272
+ type: Softmax
1273
+ input:
1274
+ /bert/encoder/layer.1/attention/self/Add_output_0:0: {}
1275
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0_min: {}
1276
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0_max: {}
1277
+ output:
1278
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0: {}
1279
+ attr:
1280
+ output_dtype: u8
1281
+ /bert/encoder/layer.1/attention/self/Transpose_3:
1282
+ type: Matmul
1283
+ input:
1284
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0: {}
1285
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0: {}
1286
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0_min: {}
1287
+ /bert/encoder/layer.1/attention/self/Softmax_output_0:0_max: {}
1288
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0_min: {}
1289
+ /bert/encoder/layer.1/attention/self/Reshape_1_output_0:0_max: {}
1290
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0_min: {}
1291
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0_max: {}
1292
+ output:
1293
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0: {}
1294
+ attr:
1295
+ src1_perm: 2,0,3,1
1296
+ dst_perm: 1,3,0,2
1297
+ output_dtype: u8
1298
+ reshape: 256,-1
1299
+ /bert/encoder/layer.1/attention/output/Add:
1300
+ type: InnerProduct
1301
+ input:
1302
+ /bert/encoder/layer.1/attention/output/dense/Transpose_output_0_quantized:0: {}
1303
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0: {}
1304
+ bert.encoder.layer.1.attention.output.dense.bias:0: {}
1305
+ /bert/encoder/layer.0/output/LayerNorm/Add_1_output_0:0: {}
1306
+ /bert/encoder/layer.1/attention/output/dense/Transpose_output_0_quantized:0_min: {}
1307
+ /bert/encoder/layer.1/attention/output/dense/Transpose_output_0_quantized:0_max: {}
1308
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0_min: {}
1309
+ /bert/encoder/layer.1/attention/self/Reshape_3_output_0:0_max: {}
1310
+ /bert/encoder/layer.1/attention/output/Add_output_0:0_min: {}
1311
+ /bert/encoder/layer.1/attention/output/Add_output_0:0_max: {}
1312
+ output:
1313
+ /bert/encoder/layer.1/attention/output/Add_output_0:0: {}
1314
+ attr:
1315
+ append_op: sum
1316
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1:
1317
+ type: LayerNorm
1318
+ input:
1319
+ /bert/encoder/layer.1/attention/output/Add_output_0:0: {}
1320
+ bert.encoder.layer.1.attention.output.LayerNorm.weight:0: {}
1321
+ bert.encoder.layer.1.attention.output.LayerNorm.bias:0: {}
1322
+ output:
1323
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0: {}
1324
+ attr:
1325
+ epsilon: 9.999999960041972e-13
1326
+ transpose_mode: 1, 0
1327
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_quant_0:
1328
+ type: Quantize
1329
+ input:
1330
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0: {}
1331
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1332
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1333
+ output:
1334
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1335
+ attr:
1336
+ output_dtype: u8
1337
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1:
1338
+ type: InnerProduct
1339
+ input:
1340
+ /bert/encoder/layer.1/intermediate/dense/Transpose_output_0_quantized:0: {}
1341
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1342
+ bert.encoder.layer.1.intermediate.dense.bias:0: {}
1343
+ /bert/encoder/layer.1/intermediate/dense/Transpose_output_0_quantized:0_min: {}
1344
+ /bert/encoder/layer.1/intermediate/dense/Transpose_output_0_quantized:0_max: {}
1345
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1346
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1347
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1348
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1349
+ output:
1350
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1351
+ attr:
1352
+ append_op: gelu_tanh
1353
+ output_dtype: u8
1354
+ /bert/encoder/layer.1/output/Add:
1355
+ type: InnerProduct
1356
+ input:
1357
+ /bert/encoder/layer.1/output/dense/Transpose_output_0_quantized:0: {}
1358
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1359
+ bert.encoder.layer.1.output.dense.bias:0: {}
1360
+ /bert/encoder/layer.1/attention/output/LayerNorm/Add_1_output_0:0: {}
1361
+ /bert/encoder/layer.1/output/dense/Transpose_output_0_quantized:0_min: {}
1362
+ /bert/encoder/layer.1/output/dense/Transpose_output_0_quantized:0_max: {}
1363
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1364
+ /bert/encoder/layer.1/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1365
+ /bert/encoder/layer.1/output/Add_output_0:0_min: {}
1366
+ /bert/encoder/layer.1/output/Add_output_0:0_max: {}
1367
+ output:
1368
+ /bert/encoder/layer.1/output/Add_output_0:0: {}
1369
+ attr:
1370
+ append_op: sum
1371
+ /bert/encoder/layer.1/output/LayerNorm/Add_1:
1372
+ type: LayerNorm
1373
+ input:
1374
+ /bert/encoder/layer.1/output/Add_output_0:0: {}
1375
+ bert.encoder.layer.1.output.LayerNorm.weight:0: {}
1376
+ bert.encoder.layer.1.output.LayerNorm.bias:0: {}
1377
+ output:
1378
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0: {}
1379
+ attr:
1380
+ epsilon: 9.999999960041972e-13
1381
+ transpose_mode: 1, 0
1382
+ /bert/encoder/layer.2/attention/self/key/Add_quant_0:
1383
+ type: Quantize
1384
+ input:
1385
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0: {}
1386
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_min: {}
1387
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_max: {}
1388
+ output:
1389
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_quant: {}
1390
+ attr:
1391
+ output_dtype: u8
1392
+ /bert/encoder/layer.2/attention/self/key/Add:
1393
+ type: InnerProduct
1394
+ input:
1395
+ /bert/encoder/layer.2/attention/self/key/Transpose_output_0_quantized:0: {}
1396
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_quant: {}
1397
+ bert.encoder.layer.2.attention.self.key.bias:0: {}
1398
+ /bert/encoder/layer.2/attention/self/key/Transpose_output_0_quantized:0_min: {}
1399
+ /bert/encoder/layer.2/attention/self/key/Transpose_output_0_quantized:0_max: {}
1400
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_min: {}
1401
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_max: {}
1402
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0_min: {}
1403
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0_max: {}
1404
+ input_ids:0: {}
1405
+ output:
1406
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0: {}
1407
+ attr:
1408
+ output_dtype: s8
1409
+ reshape: 4,64,-1, -1
1410
+ reshape_dims: '0'
1411
+ /bert/encoder/layer.2/attention/self/query/Add:
1412
+ type: InnerProduct
1413
+ input:
1414
+ /bert/encoder/layer.2/attention/self/query/Transpose_output_0_quantized:0: {}
1415
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_quant: {}
1416
+ bert.encoder.layer.2.attention.self.query.bias:0: {}
1417
+ /bert/encoder/layer.2/attention/self/query/Transpose_output_0_quantized:0_min: {}
1418
+ /bert/encoder/layer.2/attention/self/query/Transpose_output_0_quantized:0_max: {}
1419
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_min: {}
1420
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_max: {}
1421
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0_min: {}
1422
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0_max: {}
1423
+ input_ids:0: {}
1424
+ output:
1425
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0: {}
1426
+ attr:
1427
+ output_dtype: s8
1428
+ reshape: 4,64,-1, -1
1429
+ reshape_dims: '0'
1430
+ /bert/encoder/layer.2/attention/self/value/Add:
1431
+ type: InnerProduct
1432
+ input:
1433
+ /bert/encoder/layer.2/attention/self/value/Transpose_output_0_quantized:0: {}
1434
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_quant: {}
1435
+ bert.encoder.layer.2.attention.self.value.bias:0: {}
1436
+ /bert/encoder/layer.2/attention/self/value/Transpose_output_0_quantized:0_min: {}
1437
+ /bert/encoder/layer.2/attention/self/value/Transpose_output_0_quantized:0_max: {}
1438
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_min: {}
1439
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0_max: {}
1440
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0_min: {}
1441
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0_max: {}
1442
+ input_ids:0: {}
1443
+ output:
1444
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0: {}
1445
+ attr:
1446
+ output_dtype: s8
1447
+ reshape: 4,64,-1, -1
1448
+ reshape_dims: '0'
1449
+ /bert/encoder/layer.2/attention/self/Add:
1450
+ type: Matmul
1451
+ input:
1452
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0: {}
1453
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0: {}
1454
+ /bert/Mul_output_0:0: {}
1455
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0_min: {}
1456
+ /bert/encoder/layer.2/attention/self/Reshape_2_output_0:0_max: {}
1457
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0_min: {}
1458
+ /bert/encoder/layer.2/attention/self/Reshape_output_0:0_max: {}
1459
+ /bert/encoder/layer.2/attention/self/Add_output_0:0_min: {}
1460
+ /bert/encoder/layer.2/attention/self/Add_output_0:0_max: {}
1461
+ output:
1462
+ /bert/encoder/layer.2/attention/self/Add_output_0:0: {}
1463
+ attr:
1464
+ src0_perm: 2,0,3,1
1465
+ src1_perm: 2,0,1,3
1466
+ output_scale: 0.125
1467
+ format_any: false
1468
+ append_op: binary_add
1469
+ /bert/encoder/layer.2/attention/self/Softmax:
1470
+ type: Softmax
1471
+ input:
1472
+ /bert/encoder/layer.2/attention/self/Add_output_0:0: {}
1473
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0_min: {}
1474
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0_max: {}
1475
+ output:
1476
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0: {}
1477
+ attr:
1478
+ output_dtype: u8
1479
+ /bert/encoder/layer.2/attention/self/Transpose_3:
1480
+ type: Matmul
1481
+ input:
1482
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0: {}
1483
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0: {}
1484
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0_min: {}
1485
+ /bert/encoder/layer.2/attention/self/Softmax_output_0:0_max: {}
1486
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0_min: {}
1487
+ /bert/encoder/layer.2/attention/self/Reshape_1_output_0:0_max: {}
1488
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0_min: {}
1489
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0_max: {}
1490
+ output:
1491
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0: {}
1492
+ attr:
1493
+ src1_perm: 2,0,3,1
1494
+ dst_perm: 1,3,0,2
1495
+ output_dtype: u8
1496
+ reshape: 256,-1
1497
+ /bert/encoder/layer.2/attention/output/Add:
1498
+ type: InnerProduct
1499
+ input:
1500
+ /bert/encoder/layer.2/attention/output/dense/Transpose_output_0_quantized:0: {}
1501
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0: {}
1502
+ bert.encoder.layer.2.attention.output.dense.bias:0: {}
1503
+ /bert/encoder/layer.1/output/LayerNorm/Add_1_output_0:0: {}
1504
+ /bert/encoder/layer.2/attention/output/dense/Transpose_output_0_quantized:0_min: {}
1505
+ /bert/encoder/layer.2/attention/output/dense/Transpose_output_0_quantized:0_max: {}
1506
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0_min: {}
1507
+ /bert/encoder/layer.2/attention/self/Reshape_3_output_0:0_max: {}
1508
+ /bert/encoder/layer.2/attention/output/Add_output_0:0_min: {}
1509
+ /bert/encoder/layer.2/attention/output/Add_output_0:0_max: {}
1510
+ output:
1511
+ /bert/encoder/layer.2/attention/output/Add_output_0:0: {}
1512
+ attr:
1513
+ append_op: sum
1514
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1:
1515
+ type: LayerNorm
1516
+ input:
1517
+ /bert/encoder/layer.2/attention/output/Add_output_0:0: {}
1518
+ bert.encoder.layer.2.attention.output.LayerNorm.weight:0: {}
1519
+ bert.encoder.layer.2.attention.output.LayerNorm.bias:0: {}
1520
+ output:
1521
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0: {}
1522
+ attr:
1523
+ epsilon: 9.999999960041972e-13
1524
+ transpose_mode: 1, 0
1525
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_quant_0:
1526
+ type: Quantize
1527
+ input:
1528
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0: {}
1529
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1530
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1531
+ output:
1532
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1533
+ attr:
1534
+ output_dtype: u8
1535
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1:
1536
+ type: InnerProduct
1537
+ input:
1538
+ /bert/encoder/layer.2/intermediate/dense/Transpose_output_0_quantized:0: {}
1539
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1540
+ bert.encoder.layer.2.intermediate.dense.bias:0: {}
1541
+ /bert/encoder/layer.2/intermediate/dense/Transpose_output_0_quantized:0_min: {}
1542
+ /bert/encoder/layer.2/intermediate/dense/Transpose_output_0_quantized:0_max: {}
1543
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1544
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1545
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1546
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1547
+ output:
1548
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1549
+ attr:
1550
+ append_op: gelu_tanh
1551
+ output_dtype: u8
1552
+ /bert/encoder/layer.2/output/Add:
1553
+ type: InnerProduct
1554
+ input:
1555
+ /bert/encoder/layer.2/output/dense/Transpose_output_0_quantized:0: {}
1556
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1557
+ bert.encoder.layer.2.output.dense.bias:0: {}
1558
+ /bert/encoder/layer.2/attention/output/LayerNorm/Add_1_output_0:0: {}
1559
+ /bert/encoder/layer.2/output/dense/Transpose_output_0_quantized:0_min: {}
1560
+ /bert/encoder/layer.2/output/dense/Transpose_output_0_quantized:0_max: {}
1561
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1562
+ /bert/encoder/layer.2/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1563
+ /bert/encoder/layer.2/output/Add_output_0:0_min: {}
1564
+ /bert/encoder/layer.2/output/Add_output_0:0_max: {}
1565
+ output:
1566
+ /bert/encoder/layer.2/output/Add_output_0:0: {}
1567
+ attr:
1568
+ append_op: sum
1569
+ /bert/encoder/layer.2/output/LayerNorm/Add_1:
1570
+ type: LayerNorm
1571
+ input:
1572
+ /bert/encoder/layer.2/output/Add_output_0:0: {}
1573
+ bert.encoder.layer.2.output.LayerNorm.weight:0: {}
1574
+ bert.encoder.layer.2.output.LayerNorm.bias:0: {}
1575
+ output:
1576
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0: {}
1577
+ attr:
1578
+ epsilon: 9.999999960041972e-13
1579
+ transpose_mode: 1, 0
1580
+ /bert/encoder/layer.3/attention/self/key/Add_quant_0:
1581
+ type: Quantize
1582
+ input:
1583
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0: {}
1584
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_min: {}
1585
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_max: {}
1586
+ output:
1587
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_quant: {}
1588
+ attr:
1589
+ output_dtype: u8
1590
+ /bert/encoder/layer.3/attention/self/key/Add:
1591
+ type: InnerProduct
1592
+ input:
1593
+ /bert/encoder/layer.3/attention/self/key/Transpose_output_0_quantized:0: {}
1594
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_quant: {}
1595
+ bert.encoder.layer.3.attention.self.key.bias:0: {}
1596
+ /bert/encoder/layer.3/attention/self/key/Transpose_output_0_quantized:0_min: {}
1597
+ /bert/encoder/layer.3/attention/self/key/Transpose_output_0_quantized:0_max: {}
1598
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_min: {}
1599
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_max: {}
1600
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0_min: {}
1601
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0_max: {}
1602
+ input_ids:0: {}
1603
+ output:
1604
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0: {}
1605
+ attr:
1606
+ output_dtype: s8
1607
+ reshape: 4,64,-1, -1
1608
+ reshape_dims: '0'
1609
+ /bert/encoder/layer.3/attention/self/query/Add:
1610
+ type: InnerProduct
1611
+ input:
1612
+ /bert/encoder/layer.3/attention/self/query/Transpose_output_0_quantized:0: {}
1613
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_quant: {}
1614
+ bert.encoder.layer.3.attention.self.query.bias:0: {}
1615
+ /bert/encoder/layer.3/attention/self/query/Transpose_output_0_quantized:0_min: {}
1616
+ /bert/encoder/layer.3/attention/self/query/Transpose_output_0_quantized:0_max: {}
1617
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_min: {}
1618
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_max: {}
1619
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0_min: {}
1620
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0_max: {}
1621
+ input_ids:0: {}
1622
+ output:
1623
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0: {}
1624
+ attr:
1625
+ output_dtype: s8
1626
+ reshape: 4,64,-1, -1
1627
+ reshape_dims: '0'
1628
+ /bert/encoder/layer.3/attention/self/value/Add:
1629
+ type: InnerProduct
1630
+ input:
1631
+ /bert/encoder/layer.3/attention/self/value/Transpose_output_0_quantized:0: {}
1632
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_quant: {}
1633
+ bert.encoder.layer.3.attention.self.value.bias:0: {}
1634
+ /bert/encoder/layer.3/attention/self/value/Transpose_output_0_quantized:0_min: {}
1635
+ /bert/encoder/layer.3/attention/self/value/Transpose_output_0_quantized:0_max: {}
1636
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_min: {}
1637
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0_max: {}
1638
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0_min: {}
1639
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0_max: {}
1640
+ input_ids:0: {}
1641
+ output:
1642
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0: {}
1643
+ attr:
1644
+ output_dtype: s8
1645
+ reshape: 4,64,-1, -1
1646
+ reshape_dims: '0'
1647
+ /bert/encoder/layer.3/attention/self/Add:
1648
+ type: Matmul
1649
+ input:
1650
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0: {}
1651
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0: {}
1652
+ /bert/Mul_output_0:0: {}
1653
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0_min: {}
1654
+ /bert/encoder/layer.3/attention/self/Reshape_2_output_0:0_max: {}
1655
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0_min: {}
1656
+ /bert/encoder/layer.3/attention/self/Reshape_output_0:0_max: {}
1657
+ /bert/encoder/layer.3/attention/self/Add_output_0:0_min: {}
1658
+ /bert/encoder/layer.3/attention/self/Add_output_0:0_max: {}
1659
+ output:
1660
+ /bert/encoder/layer.3/attention/self/Add_output_0:0: {}
1661
+ attr:
1662
+ src0_perm: 2,0,3,1
1663
+ src1_perm: 2,0,1,3
1664
+ output_scale: 0.125
1665
+ format_any: false
1666
+ append_op: binary_add
1667
+ /bert/encoder/layer.3/attention/self/Softmax:
1668
+ type: Softmax
1669
+ input:
1670
+ /bert/encoder/layer.3/attention/self/Add_output_0:0: {}
1671
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0_min: {}
1672
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0_max: {}
1673
+ output:
1674
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0: {}
1675
+ attr:
1676
+ output_dtype: u8
1677
+ /bert/encoder/layer.3/attention/self/Transpose_3:
1678
+ type: Matmul
1679
+ input:
1680
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0: {}
1681
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0: {}
1682
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0_min: {}
1683
+ /bert/encoder/layer.3/attention/self/Softmax_output_0:0_max: {}
1684
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0_min: {}
1685
+ /bert/encoder/layer.3/attention/self/Reshape_1_output_0:0_max: {}
1686
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0_min: {}
1687
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0_max: {}
1688
+ output:
1689
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0: {}
1690
+ attr:
1691
+ src1_perm: 2,0,3,1
1692
+ dst_perm: 1,3,0,2
1693
+ output_dtype: u8
1694
+ reshape: 256,-1
1695
+ /bert/encoder/layer.3/attention/output/Add:
1696
+ type: InnerProduct
1697
+ input:
1698
+ /bert/encoder/layer.3/attention/output/dense/Transpose_output_0_quantized:0: {}
1699
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0: {}
1700
+ bert.encoder.layer.3.attention.output.dense.bias:0: {}
1701
+ /bert/encoder/layer.2/output/LayerNorm/Add_1_output_0:0: {}
1702
+ /bert/encoder/layer.3/attention/output/dense/Transpose_output_0_quantized:0_min: {}
1703
+ /bert/encoder/layer.3/attention/output/dense/Transpose_output_0_quantized:0_max: {}
1704
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0_min: {}
1705
+ /bert/encoder/layer.3/attention/self/Reshape_3_output_0:0_max: {}
1706
+ /bert/encoder/layer.3/attention/output/Add_output_0:0_min: {}
1707
+ /bert/encoder/layer.3/attention/output/Add_output_0:0_max: {}
1708
+ output:
1709
+ /bert/encoder/layer.3/attention/output/Add_output_0:0: {}
1710
+ attr:
1711
+ append_op: sum
1712
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1:
1713
+ type: LayerNorm
1714
+ input:
1715
+ /bert/encoder/layer.3/attention/output/Add_output_0:0: {}
1716
+ bert.encoder.layer.3.attention.output.LayerNorm.weight:0: {}
1717
+ bert.encoder.layer.3.attention.output.LayerNorm.bias:0: {}
1718
+ output:
1719
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0: {}
1720
+ attr:
1721
+ epsilon: 9.999999960041972e-13
1722
+ transpose_mode: 1, 0
1723
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_quant_0:
1724
+ type: Quantize
1725
+ input:
1726
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0: {}
1727
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1728
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1729
+ output:
1730
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1731
+ attr:
1732
+ output_dtype: u8
1733
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1:
1734
+ type: InnerProduct
1735
+ input:
1736
+ /bert/encoder/layer.3/intermediate/dense/Transpose_output_0_quantized:0: {}
1737
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_quant: {}
1738
+ bert.encoder.layer.3.intermediate.dense.bias:0: {}
1739
+ /bert/encoder/layer.3/intermediate/dense/Transpose_output_0_quantized:0_min: {}
1740
+ /bert/encoder/layer.3/intermediate/dense/Transpose_output_0_quantized:0_max: {}
1741
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_min: {}
1742
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0_max: {}
1743
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1744
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1745
+ output:
1746
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1747
+ attr:
1748
+ append_op: gelu_tanh
1749
+ output_dtype: u8
1750
+ /bert/encoder/layer.3/output/Add:
1751
+ type: InnerProduct
1752
+ input:
1753
+ /bert/encoder/layer.3/output/dense/Transpose_output_0_quantized:0: {}
1754
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0: {}
1755
+ bert.encoder.layer.3.output.dense.bias:0: {}
1756
+ /bert/encoder/layer.3/attention/output/LayerNorm/Add_1_output_0:0: {}
1757
+ /bert/encoder/layer.3/output/dense/Transpose_output_0_quantized:0_min: {}
1758
+ /bert/encoder/layer.3/output/dense/Transpose_output_0_quantized:0_max: {}
1759
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0_min: {}
1760
+ /bert/encoder/layer.3/intermediate/intermediate_act_fn/Mul_1_output_0:0_max: {}
1761
+ /bert/encoder/layer.3/output/Add_output_0:0_min: {}
1762
+ /bert/encoder/layer.3/output/Add_output_0:0_max: {}
1763
+ output:
1764
+ /bert/encoder/layer.3/output/Add_output_0:0: {}
1765
+ attr:
1766
+ append_op: sum
1767
+ /bert/encoder/layer.3/output/Add_Reorder_Recover:
1768
+ type: Reorder
1769
+ input:
1770
+ /bert/encoder/layer.3/output/Add_output_0:0: {}
1771
+ output:
1772
+ /bert/encoder/layer.3/output/Add_output_0:0_recover: {}
1773
+ attr:
1774
+ src_perm: 0,1
1775
+ dst_perm: 1,0
1776
+ /bert/encoder/layer.3/output/LayerNorm/Add_1:
1777
+ type: LayerNorm
1778
+ input:
1779
+ /bert/encoder/layer.3/output/Add_output_0:0_recover: {}
1780
+ bert.encoder.layer.3.output.LayerNorm.weight:0: {}
1781
+ bert.encoder.layer.3.output.LayerNorm.bias:0: {}
1782
+ output:
1783
+ /bert/encoder/layer.3/output/LayerNorm/Add_1:0: {}
1784
+ attr:
1785
+ epsilon: 9.999999960041972e-13
1786
+ last_layer_reshape:
1787
+ type: Reshape
1788
+ input:
1789
+ /bert/encoder/layer.3/output/LayerNorm/Add_1:0: {}
1790
+ input_ids:0: {}
1791
+ output:
1792
+ last_layer_reshape:0: {}
1793
+ attr:
1794
+ dst_shape: -1,-1,256
1795
+ dims: '0'
1796
+ last_layer_strided_slice:
1797
+ type: StridedSlice
1798
+ input:
1799
+ last_layer_reshape:0: {}
1800
+ output:
1801
+ last_layer_strided_slice:0: {}
1802
+ attr:
1803
+ begin_mask: 5
1804
+ ellipsis_mask: 0
1805
+ end_mask: 5
1806
+ new_axis_mask: 0
1807
+ shrink_axis_mask: 0
1808
+ begin: 0,0,0
1809
+ end: 0,1,0
1810
+ strides: 1,1,1
1811
+ /bert/pooler/Gather:
1812
+ type: Reshape
1813
+ input:
1814
+ last_layer_strided_slice:0: {}
1815
+ output:
1816
+ /bert/pooler/Gather_output_0:0: {}
1817
+ attr:
1818
+ dst_shape: -1,256
1819
+ /bert/pooler/activation/Tanh_quant_0:
1820
+ type: Quantize
1821
+ input:
1822
+ /bert/pooler/Gather_output_0:0: {}
1823
+ /bert/pooler/Gather_output_0:0_min: {}
1824
+ /bert/pooler/Gather_output_0:0_max: {}
1825
+ output:
1826
+ /bert/pooler/Gather_output_0:0_quant: {}
1827
+ attr:
1828
+ output_dtype: u8
1829
+ /bert/pooler/activation/Tanh:
1830
+ type: InnerProduct
1831
+ input:
1832
+ /bert/pooler/Gather_output_0:0_quant: {}
1833
+ bert.pooler.dense.weight_quantized:0: {}
1834
+ bert.pooler.dense.bias:0: {}
1835
+ /bert/pooler/Gather_output_0:0_min: {}
1836
+ /bert/pooler/Gather_output_0:0_max: {}
1837
+ bert.pooler.dense.weight_quantized:0_min: {}
1838
+ bert.pooler.dense.weight_quantized:0_max: {}
1839
+ /bert/pooler/activation/Tanh_output_0:0_min: {}
1840
+ /bert/pooler/activation/Tanh_output_0:0_max: {}
1841
+ output:
1842
+ /bert/pooler/activation/Tanh_output_0:0: {}
1843
+ attr:
1844
+ src1_perm: 1,0
1845
+ append_op: tanh
1846
+ output_dtype: u8
1847
+ /classifier/Gemm_Add:
1848
+ type: InnerProduct
1849
+ input:
1850
+ /bert/pooler/activation/Tanh_output_0:0: {}
1851
+ classifier.weight_quantized:0: {}
1852
+ classifier.bias:0: {}
1853
+ /bert/pooler/activation/Tanh_output_0:0_min: {}
1854
+ /bert/pooler/activation/Tanh_output_0:0_max: {}
1855
+ classifier.weight_quantized:0_min: {}
1856
+ classifier.weight_quantized:0_max: {}
1857
+ 609:0_min: {}
1858
+ 609:0_max: {}
1859
+ output:
1860
+ '609:0': {}
1861
+ attr:
1862
+ src1_perm: 1,0
1863
+ output_data:
1864
+ type: Output
1865
+ input:
1866
+ '609:0': {}
sparse_int8_ir/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1a7ee7fd4cc774c0662a7842654226c9fc916419cb11c8a8bf46673267f9b13
3
+ size 35125472