xinhe commited on
Commit
72920eb
1 Parent(s): b162079

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -15,7 +15,9 @@ metrics:
15
 
16
  ### QuantizationAwareTraining
17
 
18
- This is an INT8 PyTorch model quantized by [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc)
 
 
19
 
20
  #### Training hyperparameters
21
 
@@ -44,11 +46,13 @@ The following hyperparameters were used during training:
44
  | **Model size (MB)** |174|418|
45
 
46
  ### Load with nlp-toolkit:
 
47
  ```python
48
  from nlp_toolkit import OptimizedModel
49
  int8_model = OptimizedModel.from_pretrained(
50
- 'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
51
  )
52
  ```
 
53
  Notes:
54
  - The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.
 
15
 
16
  ### QuantizationAwareTraining
17
 
18
+ This is an INT8 PyTorch model quantized by [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
19
+
20
+ The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
21
 
22
  #### Training hyperparameters
23
 
 
46
  | **Model size (MB)** |174|418|
47
 
48
  ### Load with nlp-toolkit:
49
+
50
  ```python
51
  from nlp_toolkit import OptimizedModel
52
  int8_model = OptimizedModel.from_pretrained(
53
+ 'Intel/bert-base-uncased-mrpc-int8-qat',
54
  )
55
  ```
56
+
57
  Notes:
58
  - The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.