yuwenz commited on
Commit
67778f7
1 Parent(s): 9a17416

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +27 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -15,7 +15,9 @@ metrics:
15
 
16
  # INT8 BERT base uncased finetuned MRPC
17
 
18
- ### Post-training static quantization
 
 
19
 
20
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
21
 
@@ -25,14 +27,14 @@ The calibration dataloader is the train dataloader. The calibration sampling siz
25
 
26
  The linear module **bert.encoder.layer.9.output.dense** falls back to fp32 to meet the 1% relative accuracy loss.
27
 
28
- ### Test result
29
 
30
  | |INT8|FP32|
31
  |---|:---:|:---:|
32
  | **Accuracy (eval-f1)** |0.8959|0.9042|
33
  | **Model size (MB)** |119|418|
34
 
35
- ### Load with Intel® Neural Compressor:
36
 
37
  ```python
38
  from optimum.intel.neural_compressor import IncQuantizedModelForSequenceClassification
@@ -40,3 +42,25 @@ int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
40
  'Intel/bert-base-uncased-mrpc-int8-static',
41
  )
42
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  # INT8 BERT base uncased finetuned MRPC
17
 
18
+ ## Post-training static quantization
19
+
20
+ ### PyTorch
21
 
22
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
23
 
 
27
 
28
  The linear module **bert.encoder.layer.9.output.dense** falls back to fp32 to meet the 1% relative accuracy loss.
29
 
30
+ #### Test result
31
 
32
  | |INT8|FP32|
33
  |---|:---:|:---:|
34
  | **Accuracy (eval-f1)** |0.8959|0.9042|
35
  | **Model size (MB)** |119|418|
36
 
37
+ #### Load with Intel® Neural Compressor:
38
 
39
  ```python
40
  from optimum.intel.neural_compressor import IncQuantizedModelForSequenceClassification
 
42
  'Intel/bert-base-uncased-mrpc-int8-static',
43
  )
44
  ```
45
+
46
+ ### ONNX
47
+
48
+
49
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
50
+
51
+ The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
52
+
53
+ #### Test result
54
+
55
+ | |INT8|FP32|
56
+ |---|:---:|:---:|
57
+ | **Accuracy (eval-f1)** |0.8963|0.9042|
58
+ | **Model size (MB)** |231|418|
59
+
60
+
61
+ #### Load ONNX model:
62
+
63
+ ```python
64
+ from optimum.onnxruntime import ORTModelForSequenceClassification
65
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/bert-base-uncased-mrpc-int8-static')
66
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9de45b1909b8e194fff06788df9c33faa9eac4ab5a0ce25667964be0a65e75d8
3
+ size 241845626