lewtun HF staff commited on
Commit
17a612f
1 Parent(s): cf17f41

Create README.md (#1)

Browse files

- Create README.md (4c400e643d6c6c2e551ebd7ada51bfd68de105f4)

Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - optimum
4
+ datasets:
5
+ - banking77
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: quantized-distilbert-banking77
10
+ results:
11
+ - task:
12
+ name: Text Classification
13
+ type: text-classification
14
+ dataset:
15
+ name: banking77
16
+ type: banking77
17
+ metrics:
18
+ - name: Accuracy
19
+ type: accuracy
20
+ value: 0.9244
21
+ ---
22
+
23
+
24
+ # Quantized-distilbert-banking77
25
+
26
+ This model is a dynamically quantized version of [optimum/distilbert-base-uncased-finetuned-banking77](https://huggingface.co/optimum/distilbert-base-uncased-finetuned-banking77) on the `banking77` dataset.
27
+
28
+ The model was created using the [dynamic-quantization](https://github.com/huggingface/workshops/tree/main/mlops-world) notebook from a workshop presented at MLOps World 2022.
29
+
30
+ It achieves the following results on the evaluation set:
31
+
32
+ **Accuracy**
33
+
34
+ - Vanilla model: 92.5%
35
+ - Quantized model: 92.44%
36
+
37
+ > The quantized model achieves 99.72% accuracy of the fp32 model
38
+
39
+ **Latency**
40
+
41
+ Payload sequence length: 128
42
+ Instance type: AWS c6i.xlarge
43
+
44
+ | latency | vanilla transformers | quantized optimum model | improvement |
45
+ |---------|----------------------|-------------------------|-------------|
46
+ | p95 | 63.24ms | 37.06ms | 1.71x |
47
+ | avg | 62.87ms | 37.93ms | 1.66x |
48
+
49
+ ## How to use
50
+
51
+ ```python
52
+ from optimum.onnxruntime import ORTModelForSequenceClassification
53
+ from transformers import pipeline, AutoTokenizer
54
+
55
+ model = ORTModelForSequenceClassification.from_pretrained("lewtun/quantized-distilbert-banking77")
56
+ tokenizer = AutoTokenizer.from_pretrained("lewtun/quantized-distilbert-banking77")
57
+
58
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
59
+ classifier("What is the exchange rate like on this app?")
60
+ ```