xinhe commited on
Commit
a96c305
1 Parent(s): d0e482a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -17,7 +17,7 @@ metrics:
17
 
18
  This is an INT8 PyTorch model quantified with [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
19
 
20
- The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
21
 
22
  The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so
23
  the real sampling size is 104.
@@ -33,13 +33,14 @@ The calibration dataloader is the train dataloader. The default calibration samp
33
  | **Accuracy (eval-accuracy)** |0.9037|0.9106|
34
  | **Model size (MB)** |65|255|
35
 
36
-
37
  ### Load with nlp-toolkit:
 
38
  ```python
39
  from nlp_toolkit import OptimizedModel
40
  int8_model = OptimizedModel.from_pretrained(
41
  'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
42
  )
43
  ```
 
44
  Notes:
45
  - The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.
 
17
 
18
  This is an INT8 PyTorch model quantified with [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
19
 
20
+ The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
21
 
22
  The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so
23
  the real sampling size is 104.
 
33
  | **Accuracy (eval-accuracy)** |0.9037|0.9106|
34
  | **Model size (MB)** |65|255|
35
 
 
36
  ### Load with nlp-toolkit:
37
+
38
  ```python
39
  from nlp_toolkit import OptimizedModel
40
  int8_model = OptimizedModel.from_pretrained(
41
  'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
42
  )
43
  ```
44
+
45
  Notes:
46
  - The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.