yuwenz commited on
Commit
c731696
1 Parent(s): 4c073da

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +30 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -8,6 +8,7 @@ tags:
8
  - neural-compressor
9
  - Intel® Neural Compressor
10
  - PostTrainingStatic
 
11
  datasets:
12
  - glue
13
  metrics:
@@ -29,7 +30,9 @@ model-index:
29
  ---
30
  # INT8 xlnet-base-cased-mrpc
31
 
32
- ### Post-training static quantization
 
 
33
 
34
  This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
35
 
@@ -37,14 +40,14 @@ The original fp32 model comes from the fine-tuned model [xlnet-base-cased-mrpc](
37
 
38
  The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so the real sampling size is 304.
39
 
40
- ### Test result
41
 
42
  | |INT8|FP32|
43
  |---|:---:|:---:|
44
  | **Accuracy (eval-f1)** |0.8893|0.8897|
45
  | **Model size (MB)** |215|448|
46
 
47
- ### Load with Intel® Neural Compressor:
48
 
49
  ```python
50
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
@@ -53,3 +56,27 @@ int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
53
  "Intel/xlnet-base-cased-mrpc-int8-static",
54
  )
55
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - neural-compressor
9
  - Intel® Neural Compressor
10
  - PostTrainingStatic
11
+ - onnx
12
  datasets:
13
  - glue
14
  metrics:
 
30
  ---
31
  # INT8 xlnet-base-cased-mrpc
32
 
33
+ ## Post-training static quantization
34
+
35
+ ### PyTorch
36
 
37
  This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
38
 
 
40
 
41
  The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so the real sampling size is 304.
42
 
43
+ #### Test result
44
 
45
  | |INT8|FP32|
46
  |---|:---:|:---:|
47
  | **Accuracy (eval-f1)** |0.8893|0.8897|
48
  | **Model size (MB)** |215|448|
49
 
50
+ #### Load with Intel® Neural Compressor:
51
 
52
  ```python
53
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
 
56
  "Intel/xlnet-base-cased-mrpc-int8-static",
57
  )
58
  ```
59
+
60
+ ### ONNX
61
+
62
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
63
+
64
+ The original fp32 model comes from the fine-tuned model [xlnet-base-cased-mrpc](https://huggingface.co/Intel/xlnet-base-cased-mrpc).
65
+
66
+ The calibration dataloader is the eval dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8. So the real sampling size is 104.
67
+
68
+ #### Test result
69
+
70
+ | |INT8|FP32|
71
+ |---|:---:|:---:|
72
+ | **Accuracy (eval-f1)** |0.8935|0.8986|
73
+ | **Model size (MB)** |286|448|
74
+
75
+
76
+ #### Load ONNX model:
77
+
78
+ ```python
79
+ from optimum.onnxruntime import ORTModelForSequenceClassification
80
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/xlnet-base-cased-mrpc-int8-static')
81
+ ```
82
+
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c71602889b26bf3079ca71f4bb721481d24d6f082d9edada98d3dfe41e9454d
3
+ size 299662965