yuwenz commited on
Commit
4843e13
1 Parent(s): 7529597

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +26 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -29,7 +29,9 @@ model-index:
29
  ---
30
  # INT8 roberta-base-mrpc
31
 
32
- ### Post-training static quantization
 
 
33
 
34
  This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
35
 
@@ -37,14 +39,14 @@ The original fp32 model comes from the fine-tuned model [roberta-base-mrpc](http
37
 
38
  The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.
39
 
40
- ### Test result
41
 
42
  | |INT8|FP32|
43
  |---|:---:|:---:|
44
  | **Accuracy (eval-f1)** |0.9177|0.9138|
45
  | **Model size (MB)** |127|499|
46
 
47
- ### Load with Intel® Neural Compressor:
48
 
49
  ```python
50
  from optimum.intel.neural_compressor import IncQuantizedModelForSequenceClassification
@@ -52,3 +54,24 @@ from optimum.intel.neural_compressor import IncQuantizedModelForSequenceClassifi
52
  model_id = "Intel/roberta-base-mrpc-int8-static"
53
  int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(model_id)
54
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ---
30
  # INT8 roberta-base-mrpc
31
 
32
+ ## Post-training static quantization
33
+
34
+ ### PyTorch
35
 
36
  This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
37
 
 
39
 
40
  The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.
41
 
42
+ #### Test result
43
 
44
  | |INT8|FP32|
45
  |---|:---:|:---:|
46
  | **Accuracy (eval-f1)** |0.9177|0.9138|
47
  | **Model size (MB)** |127|499|
48
 
49
+ #### Load with Intel® Neural Compressor:
50
 
51
  ```python
52
  from optimum.intel.neural_compressor import IncQuantizedModelForSequenceClassification
 
54
  model_id = "Intel/roberta-base-mrpc-int8-static"
55
  int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(model_id)
56
  ```
57
+
58
+ ### ONNX
59
+
60
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
61
+
62
+ The original fp32 model comes from the fine-tuned model [roberta-base-mrpc](https://huggingface.co/Intel/roberta-base-mrpc).
63
+
64
+ #### Test result
65
+
66
+ | |INT8|FP32|
67
+ |---|:---:|:---:|
68
+ | **Accuracy (eval-f1)** |0.9073|0.9138|
69
+ | **Model size (MB)** |243|476|
70
+
71
+
72
+ #### Load ONNX model:
73
+
74
+ ```python
75
+ from optimum.onnxruntime import ORTModelForSequenceClassification
76
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/roberta-base-mrpc-int8-static')
77
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51de3a4577a50af94e72dd8a75d77b9145726d3f83de336a2573fd12180c6075
3
+ size 254669454