File size: 721 Bytes
ee51b8f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
library_name: optimum
tags:
- onnx
- quantized
- int8
- intent-classification
base_model: rbojja/intent-classification-small
---
# Intent Classification ONNX Quantized
Quantized ONNX version for fast inference.
## Usage
```python
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized")
tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized")
text = "I want to book a flight"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
```
## Performance
- ~4x smaller size
- 2-4x faster inference
- Minimal accuracy loss
|