yuwenz commited on
Commit
04b5738
1 Parent(s): e064e32

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +27 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -7,6 +7,7 @@ tags:
7
  - int8
8
  - Intel® Neural Compressor
9
  - PostTrainingDynamic
 
10
  datasets:
11
  - glue
12
  metrics:
@@ -28,20 +29,22 @@ model-index:
28
  ---
29
  # INT8 bart-large-mrpc
30
 
31
- ### Post-training dynamic quantization
 
 
32
 
33
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
34
 
35
  The original fp32 model comes from the fine-tuned model [bart-large-mrpc](https://huggingface.co/Intel/bart-large-mrpc).
36
 
37
- ### Test result
38
 
39
  | |INT8|FP32|
40
  |---|:---:|:---:|
41
  | **Accuracy (eval-f1)** |0.9051|0.9120|
42
  | **Model size (MB)** |547|1556.48|
43
 
44
- ### Load with optimum:
45
 
46
  ```python
47
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
@@ -49,3 +52,24 @@ int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
49
  'Intel/bart-large-mrpc-int8-dynamic',
50
  )
51
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - int8
8
  - Intel® Neural Compressor
9
  - PostTrainingDynamic
10
+ - onnx
11
  datasets:
12
  - glue
13
  metrics:
 
29
  ---
30
  # INT8 bart-large-mrpc
31
 
32
+ ## Post-training dynamic quantization
33
+
34
+ ### PyTorch
35
 
36
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
37
 
38
  The original fp32 model comes from the fine-tuned model [bart-large-mrpc](https://huggingface.co/Intel/bart-large-mrpc).
39
 
40
+ #### Test result
41
 
42
  | |INT8|FP32|
43
  |---|:---:|:---:|
44
  | **Accuracy (eval-f1)** |0.9051|0.9120|
45
  | **Model size (MB)** |547|1556.48|
46
 
47
+ #### Load with optimum:
48
 
49
  ```python
50
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
 
52
  'Intel/bart-large-mrpc-int8-dynamic',
53
  )
54
  ```
55
+
56
+ ### ONNX
57
+
58
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
59
+
60
+ The original fp32 model comes from the fine-tuned model [bart-large-mrpc](https://huggingface.co/Intel/bart-large-mrpc).
61
+
62
+ #### Test result
63
+
64
+ | |INT8|FP32|
65
+ |---|:---:|:---:|
66
+ | **Accuracy (eval-f1)** |0.9134|0.9120|
67
+ | **Model size (MB)** |395|1555|
68
+
69
+
70
+ #### Load ONNX model:
71
+
72
+ ```python
73
+ from optimum.onnxruntime import ORTModelForSequenceClassification
74
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/bart-large-mrpc-int8-dynamic')
75
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6f32f2e86da30d0634b16a761c6b435a2a3bf93d060e55f0bdac19df085f11c
3
+ size 413817970