xinhe commited on
Commit
fcf0731
1 Parent(s): 885f6cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  language: en
3
  license: apache-2.0
4
- tags: text-classfication
5
  datasets:
6
  - sst2
7
  ---
@@ -10,21 +10,21 @@ INT8 DistilBERT base uncased finetuned SST-2 (Post-training static quantization)
10
  ===
11
  This is an INT8 PyTorch model quantized by [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
12
 
13
- Test result below comes from [AWS](https://aws.amazon.com/) c6i.xlarge (intel ice lake: 4 vCPUs, 8g Memory) instance.
14
 
15
- | |fp32|int8|
16
  |---|:---:|:---:|
17
- | **Accuracy** |0.9106|0.9037|
18
- | **Throughput (samples/sec)** |?|?|
19
- | **Model size (MB)** |255|66|
20
 
21
 
22
- Load with optimum:
23
  ```python
24
  from nlp_toolkit import OptimizedModel
25
  int8_model = OptimizedModel.from_pretrained(
26
- 'intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
27
  )
28
  ```
29
  Notes:
30
- - The INT8 model has better performance than the FP32 model when the CPU is fully loaded. Otherwise, there will be the illusion that INT8 is inferior to FP32.
 
1
  ---
2
  language: en
3
  license: apache-2.0
4
+ tags: text-classfication, int8, PostTrainingStatic
5
  datasets:
6
  - sst2
7
  ---
 
10
  ===
11
  This is an INT8 PyTorch model quantized by [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
12
 
13
+ Test result below comes from [Amazon Web Services](https://aws.amazon.com/) c6i.xlarge (intel ice lake: 4 vCPUs, 8g Memory) instance.
14
 
15
+ | |int8|fp32|
16
  |---|:---:|:---:|
17
+ | **Throughput (samples/sec)** |47.554|23.046|
18
+ | **Accuracy(f1-score)** |0.9037|0.9106|
19
+ | **Model size (MB)** |66|255|
20
 
21
 
22
+ Load with nlp-toolkit:
23
  ```python
24
  from nlp_toolkit import OptimizedModel
25
  int8_model = OptimizedModel.from_pretrained(
26
+ 'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
27
  )
28
  ```
29
  Notes:
30
+ - The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.