eduardo-alvarez commited on
Commit
9e51d7a
·
verified ·
1 Parent(s): d51bda2

Enriching model card for improved discoverability and consumption

Browse files
Files changed (1) hide show
  1. README.md +40 -26
README.md CHANGED
@@ -13,55 +13,69 @@ metrics:
13
  - f1
14
  ---
15
 
16
- # INT8 DistilBERT base uncased finetuned on Squad
 
17
 
18
- ## Post-training static quantization
19
 
20
- ### PyTorch
 
 
 
 
 
 
 
 
 
 
21
 
22
- This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
23
-
24
- The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squad).
 
 
25
 
26
- The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so the real sampling size is 304.
27
 
28
- The linear module **distilbert.transformer.layer.1.ffn.lin2** falls back to fp32 to meet the 1% relative accuracy loss.
29
 
30
- #### Test result
31
 
32
  | |INT8|FP32|
33
  |---|:---:|:---:|
34
  | **Accuracy (eval-f1)** |86.1069|86.8374|
35
  | **Model size (MB)** |74.7|265|
36
 
37
- #### Load with optimum:
38
-
39
- ```python
40
- from optimum.intel import INCModelForQuestionAnswering
41
-
42
- model_id = "Intel/distilbert-base-uncased-distilled-squad-int8-static"
43
- int8_model = INCModelForQuestionAnswering.from_pretrained(model_id)
44
- ```
45
-
46
- ### ONNX
47
 
48
  This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
49
 
50
- The original fp32 model comes from the fine-tuned model [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squad).
51
-
52
- The calibration dataloader is the eval dataloader. The default calibration sampling size is 100.
53
-
54
- #### Test result
55
-
56
  | |INT8|FP32|
57
  |---|:---:|:---:|
58
  | **Accuracy (eval-f1)** |0.8633|0.8687|
59
  | **Model size (MB)** |154|254|
60
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- #### Load ONNX model:
63
 
64
  ```python
65
  from optimum.onnxruntime import ORTModelForQuestionAnswering
66
  model = ORTModelForQuestionAnswering.from_pretrained('Intel/distilbert-base-uncased-distilled-squad-int8-static')
67
  ```
 
 
 
 
 
 
 
 
13
  - f1
14
  ---
15
 
16
+ # Model Card for INT8 DistilBERT Base Uncased Fine-Tuned on SQuAD
17
+ This model is an INT8 quantized version of DistilBERT base uncased, which has been fine-tuned on the Stanford Question Answering Dataset (SQuAD). The quantization was performed using the Hugging Face's Optimum-Intel, leveraging the Intel® Neural Compressor.
18
 
 
19
 
20
+ | Model Detail | Description |
21
+ | ----------- | ----------- |
22
+ | Model Authors | Xin He Zixuan Cheng Yu Wenz |
23
+ | Date | Aug 4, 2022 |
24
+ | Version | The base model for this quantization process was distilbert-base-uncased-distilled-squad, a distilled version of BERT designed for the question-answering task. |
25
+ | Type | Language Model |
26
+ | Paper or Other Resources | Base Model: [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert/distilbert-base-uncased-distilled-squad) |
27
+ | License | apache-2.0 |
28
+ | Questions or Comments | [Community Tab](https://huggingface.co/Intel/distilbert-base-uncased-distilled-squad-int8-static-inc/discussions) and [Intel DevHub Discord](https://discord.gg/rv2Gp55UJQ)|
29
+ | Quantization Details | The model underwent post-training static quantization to convert it from its original FP32 precision to INT8, optimizing for size and inference speed while aiming to retain as much of the original model's accuracy as possible.|
30
+ | Calibration Details | For PyTorch, the calibration dataloader was the train dataloader with a real sampling size of 304 due to the default calibration sampling size of 300 not being exactly divisible by the batch size of 8. For the ONNX version, the calibration was performed using the eval dataloader with a default calibration sampling size of 100. |
31
 
32
+ | Intended Use | Description |
33
+ | ----------- | ----------- |
34
+ | Primary intended uses | This model is intended for question-answering tasks, where it can provide answers to questions given a context passage. It is optimized for scenarios requiring fast inference and reduced model size without significantly compromising accuracy. |
35
+ | Primary intended users | Researchers, developers, and enterprises that require efficient, low-latency question answering capabilities in their applications, particularly where computational resources are limited. |
36
+ | Out-of-scope uses | |
37
 
38
+ # Evaluation
39
 
40
+ ### PyTorch Version
41
 
42
+ This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
43
 
44
  | |INT8|FP32|
45
  |---|:---:|:---:|
46
  | **Accuracy (eval-f1)** |86.1069|86.8374|
47
  | **Model size (MB)** |74.7|265|
48
 
49
+ ### ONNX Version
 
 
 
 
 
 
 
 
 
50
 
51
  This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
52
 
 
 
 
 
 
 
53
  | |INT8|FP32|
54
  |---|:---:|:---:|
55
  | **Accuracy (eval-f1)** |0.8633|0.8687|
56
  | **Model size (MB)** |154|254|
57
 
58
+ # Usage
59
+
60
+ **Optimum Intel w/ Neural Compressor**
61
+
62
+ ```python
63
+ from optimum.intel import INCModelForQuestionAnswering
64
+
65
+ model_id = "Intel/distilbert-base-uncased-distilled-squad-int8-static"
66
+ int8_model = INCModelForQuestionAnswering.from_pretrained(model_id)
67
+ ```
68
 
69
+ **Optimum w/ ONNX Runtime**
70
 
71
  ```python
72
  from optimum.onnxruntime import ORTModelForQuestionAnswering
73
  model = ORTModelForQuestionAnswering.from_pretrained('Intel/distilbert-base-uncased-distilled-squad-int8-static')
74
  ```
75
+
76
+ # Ethical Considerations
77
+ While not explicitly mentioned, users should be aware of potential biases present in the training data (SQuAD and Wikipedia), and consider the implications of these biases on the model's outputs. Additionally, quantization may introduce or exacerbate biases in certain scenarios.
78
+
79
+ # Caveats and Recommendations
80
+ - Users should consider the balance between performance and accuracy when deploying quantized models in critical applications.
81
+ - Further fine-tuning or calibration may be necessary for specific use cases or to meet stricter accuracy requirements.