File size: 1,911 Bytes
296f7bb
 
 
78c21db
 
 
8e70cc9
78c21db
296f7bb
 
78c21db
ad58783
296f7bb
 
ad58783
296f7bb
ac67851
 
 
296f7bb
b71abb6
ad58783
a96c305
ad58783
 
 
 
ac67851
ad58783
 
296f7bb
ad58783
d0e482a
296f7bb
ac67851
a96c305
296f7bb
b71abb6
 
fcf0731
296f7bb
 
ac67851
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
language: en
license: apache-2.0
tags:
- text-classfication
- int8
- Intel® Neural Compressor
- PostTrainingStatic
datasets: 
- sst2
metrics:
- accuracy
---

# INT8 DistilBERT base uncased finetuned SST-2

## Post-training static quantization

### PyTorch

This is an INT8  PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor). 

The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).

The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so
 the real sampling size is 104.

#### Test result

|   |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-accuracy)** |0.9037|0.9106|
| **Model size (MB)**  |65|255|

#### Load with optimum:

```python
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
    'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
)
```

### ONNX

This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).


#### Test result

|   |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-f1)** |0.9060|0.9106|
| **Model size (MB)**  |80|256|

#### Load ONNX model:

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained('Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static')
```