File size: 721 Bytes
ee51b8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
library_name: optimum
tags:
- onnx
- quantized
- int8
- intent-classification
base_model: rbojja/intent-classification-small
---

# Intent Classification ONNX Quantized

Quantized ONNX version for fast inference.

## Usage

```python
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer

model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized")
tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized")

text = "I want to book a flight"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
```

## Performance
- ~4x smaller size
- 2-4x faster inference  
- Minimal accuracy loss