upload int8 onnx model

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ metrics:
 tags:
 - text-classfication
 - int8
 ---
 # Dynamically quantized DistilBERT base uncased finetuned SST-2
@@ -26,6 +27,8 @@ tags:
 ## How to Get Started With the Model
 To load the quantized model, you can do as follows:
 ```python
@@ -33,3 +36,23 @@ from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSeq
 model = IncQuantizedModelForSequenceClassification.from_pretrained("Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic")
 ```

 tags:
 - text-classfication
 - int8
+- onnx
 ---
 # Dynamically quantized DistilBERT base uncased finetuned SST-2
 ## How to Get Started With the Model
+### PyTorch
 To load the quantized model, you can do as follows:
 ```python
 model = IncQuantizedModelForSequenceClassification.from_pretrained("Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic")
 ```
+### ONNX
+This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
+The original fp32 model comes from the fine-tuned model [DistilBERT](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
+#### Test result
+|   |INT8|FP32|
+|---|:---:|:---:|
+| **Accuracy (eval-f1)** |0.9037|0.9106|
+| **Model size (MB)**  |73|256|
+#### Load ONNX model:
+```python
+from optimum.onnxruntime import ORTModelForSequenceClassification
+model = ORTModelForSequenceClassification.from_pretrained('Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic')
+```

model.onnx ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:ffaa5bd531a044237ee88f08e97dcae85bb121806fd1a5e7c556a13927343ad4
+size 76104966