Update README.md
Browse files
README.md
CHANGED
@@ -12,8 +12,11 @@ metrics:
|
|
12 |
---
|
13 |
|
14 |
# INT8 DistilBart finetuned on CNN DailyMail
|
|
|
15 |
### Post-training dynamic quantization
|
|
|
16 |
This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
|
|
|
17 |
The original fp32 model comes from the fine-tuned model [sysresearch101/t5-large-finetuned-xsum-cnn](https://huggingface.co/sysresearch101/t5-large-finetuned-xsum-cnn).
|
18 |
|
19 |
Below linear modules are fallbacked to fp32 for less than 1% relative accuracy loss:
|
@@ -21,11 +24,13 @@ Below linear modules are fallbacked to fp32 for less than 1% relative accuracy l
|
|
21 |
**'model.decoder.layers.2.fc2'**, **'model.encoder.layers.11.fc2'**, **'model.decoder.layers.1.fc2'**, **'model.decoder.layers.0.fc2'**, **'model.decoder.layers.4.fc1'**, **'model.decoder.layers.3.fc2'**, **'model.encoder.layers.8.fc2'**, **'model.decoder.layers.3.fc1'**, **'model.encoder.layers.11.fc1'**, **'model.encoder.layers.0.fc2'**, **'model.encoder.layers.3.fc1'**, **'model.encoder.layers.10.fc2'**, **'model.decoder.layers.5.fc1'**, **'model.encoder.layers.1.fc2'**, **'model.encoder.layers.3.fc2'**, **'lm_head'**, **'model.encoder.layers.7.fc2'**, **'model.decoder.layers.0.fc1'**, **'model.encoder.layers.4.fc1'**, **'model.encoder.layers.10.fc1'**, **'model.encoder.layers.6.fc1'**
|
22 |
|
23 |
### Evaluation result
|
|
|
24 |
| |INT8|FP32|
|
25 |
|---|:---:|:---:|
|
26 |
| **Accuracy (eval-rougeLsum)** | 41.4707 | 41.8117 |
|
27 |
| **Model size** |722M|1249M|
|
28 |
### Load with optimum:
|
|
|
29 |
```python
|
30 |
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSeq2SeqLM
|
31 |
int8_model = IncQuantizedModelForSeq2SeqLM.from_pretrained(
|
|
|
12 |
---
|
13 |
|
14 |
# INT8 DistilBart finetuned on CNN DailyMail
|
15 |
+
|
16 |
### Post-training dynamic quantization
|
17 |
+
|
18 |
This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
|
19 |
+
|
20 |
The original fp32 model comes from the fine-tuned model [sysresearch101/t5-large-finetuned-xsum-cnn](https://huggingface.co/sysresearch101/t5-large-finetuned-xsum-cnn).
|
21 |
|
22 |
Below linear modules are fallbacked to fp32 for less than 1% relative accuracy loss:
|
|
|
24 |
**'model.decoder.layers.2.fc2'**, **'model.encoder.layers.11.fc2'**, **'model.decoder.layers.1.fc2'**, **'model.decoder.layers.0.fc2'**, **'model.decoder.layers.4.fc1'**, **'model.decoder.layers.3.fc2'**, **'model.encoder.layers.8.fc2'**, **'model.decoder.layers.3.fc1'**, **'model.encoder.layers.11.fc1'**, **'model.encoder.layers.0.fc2'**, **'model.encoder.layers.3.fc1'**, **'model.encoder.layers.10.fc2'**, **'model.decoder.layers.5.fc1'**, **'model.encoder.layers.1.fc2'**, **'model.encoder.layers.3.fc2'**, **'lm_head'**, **'model.encoder.layers.7.fc2'**, **'model.decoder.layers.0.fc1'**, **'model.encoder.layers.4.fc1'**, **'model.encoder.layers.10.fc1'**, **'model.encoder.layers.6.fc1'**
|
25 |
|
26 |
### Evaluation result
|
27 |
+
|
28 |
| |INT8|FP32|
|
29 |
|---|:---:|:---:|
|
30 |
| **Accuracy (eval-rougeLsum)** | 41.4707 | 41.8117 |
|
31 |
| **Model size** |722M|1249M|
|
32 |
### Load with optimum:
|
33 |
+
|
34 |
```python
|
35 |
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSeq2SeqLM
|
36 |
int8_model = IncQuantizedModelForSeq2SeqLM.from_pretrained(
|