yuwenz commited on
Commit
ac67851
1 Parent(s): b71abb6

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +26 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -14,7 +14,9 @@ metrics:
14
 
15
  # INT8 DistilBERT base uncased finetuned SST-2
16
 
17
- ### Post-training static quantization
 
 
18
 
19
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
20
 
@@ -23,14 +25,14 @@ The original fp32 model comes from the fine-tuned model [distilbert-base-uncased
23
  The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so
24
  the real sampling size is 104.
25
 
26
- ### Test result
27
 
28
  | |INT8|FP32|
29
  |---|:---:|:---:|
30
  | **Accuracy (eval-accuracy)** |0.9037|0.9106|
31
  | **Model size (MB)** |65|255|
32
 
33
- ### Load with optimum:
34
 
35
  ```python
36
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
@@ -38,3 +40,24 @@ int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
38
  'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
39
  )
40
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  # INT8 DistilBERT base uncased finetuned SST-2
16
 
17
+ ## Post-training static quantization
18
+
19
+ ### PyTorch
20
 
21
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
22
 
 
25
  The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so
26
  the real sampling size is 104.
27
 
28
+ #### Test result
29
 
30
  | |INT8|FP32|
31
  |---|:---:|:---:|
32
  | **Accuracy (eval-accuracy)** |0.9037|0.9106|
33
  | **Model size (MB)** |65|255|
34
 
35
+ #### Load with optimum:
36
 
37
  ```python
38
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
 
40
  'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
41
  )
42
  ```
43
+
44
+ ### ONNX
45
+
46
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
47
+
48
+ The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
49
+
50
+
51
+ #### Test result
52
+
53
+ | |INT8|FP32|
54
+ |---|:---:|:---:|
55
+ | **Accuracy (eval-f1)** |0.9060|0.9106|
56
+ | **Model size (MB)** |80|256|
57
+
58
+ #### Load ONNX model:
59
+
60
+ ```python
61
+ from optimum.onnxruntime import ORTModelForSequenceClassification
62
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static')
63
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24cd860d20b786211162fb5c6c41e46ad19dba261c6ae1b64e78af0ebffaff9b
3
+ size 83179400