yuwenz commited on
Commit
c2f8181
1 Parent(s): 9b2b7ef

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +29 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - int8
7
  - Intel® Neural Compressor
8
  - PostTrainingDynamic
 
9
  datasets:
10
  - mrpc
11
  metrics:
@@ -14,20 +15,22 @@ metrics:
14
 
15
  # INT8 BERT base uncased finetuned MRPC
16
 
17
- ### Post-training dynamic quantization
 
 
18
 
19
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
20
 
21
  The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
22
 
23
- ### Test result
24
 
25
  | |INT8|FP32|
26
  |---|:---:|:---:|
27
  | **Accuracy (eval-f1)** |0.8997|0.9042|
28
  | **Model size (MB)** |174|418|
29
 
30
- ### Load with optimum:
31
 
32
  ```python
33
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
@@ -35,3 +38,26 @@ int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
35
  'Intel/bert-base-uncased-mrpc-int8-dynamic',
36
  )
37
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - int8
7
  - Intel® Neural Compressor
8
  - PostTrainingDynamic
9
+ - onnx
10
  datasets:
11
  - mrpc
12
  metrics:
15
 
16
  # INT8 BERT base uncased finetuned MRPC
17
 
18
+ ## Post-training dynamic quantization
19
+
20
+ ### PyTorch
21
 
22
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
23
 
24
  The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
25
 
26
+ #### Test result
27
 
28
  | |INT8|FP32|
29
  |---|:---:|:---:|
30
  | **Accuracy (eval-f1)** |0.8997|0.9042|
31
  | **Model size (MB)** |174|418|
32
 
33
+ #### Load with optimum:
34
 
35
  ```python
36
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
38
  'Intel/bert-base-uncased-mrpc-int8-dynamic',
39
  )
40
  ```
41
+
42
+ ### ONNX
43
+
44
+
45
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
46
+
47
+ The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
48
+
49
+ #### Test result
50
+
51
+ | |INT8|FP32|
52
+ |---|:---:|:---:|
53
+ | **Accuracy (eval-f1)** |0.8958|0.9042|
54
+ | **Model size (MB)** |107|418|
55
+
56
+
57
+ #### Load ONNX model:
58
+
59
+ ```python
60
+ from optimum.onnxruntime import ORTModelForSequenceClassification
61
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/bert-base-uncased-mrpc-int8-dynamic')
62
+ ```
63
+
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d016098aad1faaaf288dbe5070f1ce6f150fc0465a52b5b69751debb0d652ba9
3
+ size 111876840