bconsolvo commited on
Commit
caa34c0
1 Parent(s): 9946b1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -169
README.md CHANGED
@@ -2,145 +2,106 @@
2
  tags:
3
  - question-answering
4
  - bert
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
- # Model Card for dynamic_tinybert
8
-
9
- # Model Details
10
-
11
- ## Model Description
12
-
13
- Dynamic-TinyBERT: Boost TinyBERT’s Inference Efficiency by Dynamic Sequence Length
14
-
15
-
16
- - **Developed by:** Intel
17
- - **Shared by [Optional]:** Intel
18
- - **Model type:** Question Answering
19
- - **Language(s) (NLP):** More information needed
20
- - **License:** More information needed
21
- - **Parent Model:** BERT
22
- - **Resources for more information:**
23
- - [Associated Paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf)
24
-
25
-
26
-
27
-
28
- # Uses
29
-
30
-
31
- ## Direct Use
32
- This model can be used for the task of question answering.
33
-
34
- ## Downstream Use [Optional]
35
-
36
- More information needed.
37
-
38
- ## Out-of-Scope Use
39
-
40
- The model should not be used to intentionally create hostile or alienating environments for people.
41
-
42
- # Bias, Risks, and Limitations
43
-
44
-
45
- Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
46
-
47
-
48
-
49
- ## Recommendations
50
-
51
-
52
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
53
-
54
- # Training Details
55
-
56
- ## Training Data
57
-
58
- The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
59
- > All our experiments are evaluated on the challenging question-answering benchmark SQuAD1.1 [11].
60
-
61
-
62
- ## Training Procedure
63
-
64
-
65
- ### Preprocessing
66
-
67
- The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
68
- > We start with a pre-trained general-TinyBERT student, which was trained to learn the general knowledge of BERT using the general-distillation method presented by TinyBERT. We perform transformer distillation from a fine- tuned BERT teacher to the student, following the same training steps used in the original TinyBERT: (1) **intermediate-layer distillation (ID)** — learning the knowledge residing in the hidden states and attentions matrices, and (2) **prediction-layer distillation (PD)** — fitting the predictions of the teacher.
69
-
70
-
71
-
72
-
73
- ### Speeds, Sizes, Times
74
-
75
- The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
76
- >For our Dynamic-TinyBERT model we use the architecture of TinyBERT6L: a small BERT model with 6 layers, a hidden size of 768, a feed forward size of 3072 and 12 heads.
77
-
78
- # Evaluation
79
-
80
-
81
- ## Testing Data, Factors & Metrics
82
-
83
- ### Testing Data
84
-
85
- More information needed
86
-
87
- ### Factors
88
- More information needed
89
-
90
- ### Metrics
91
-
92
- More information needed
93
-
94
-
95
- ## Results
96
-
97
- The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
98
-
99
  | Model | Max F1 (full model) | Best Speedup within BERT-1% |
100
  |------------------|---------------------|-----------------------------|
101
  | Dynamic-TinyBERT | 88.71 | 3.3x |
102
 
 
 
 
 
 
 
 
 
103
 
 
 
 
104
 
105
-
106
- # Model Examination
107
-
108
- More information needed
109
-
110
- # Environmental Impact
111
-
112
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
113
-
114
- - **Hardware Type:** Titan GPU
115
- - **Hours used:** More information needed
116
- - **Cloud Provider:** More information needed
117
- - **Compute Region:** More information needed
118
- - **Carbon Emitted:** More information needed
119
-
120
- # Technical Specifications [optional]
121
-
122
- ## Model Architecture and Objective
123
-
124
- More information needed
125
-
126
- ## Compute Infrastructure
127
-
128
- More information needed
129
-
130
- ### Hardware
131
-
132
-
133
- More information needed
134
-
135
- ### Software
136
-
137
- More information needed.
138
-
139
- # Citation
140
-
141
-
142
- **BibTeX:**
143
 
 
144
  ```bibtex
145
  @misc{https://doi.org/10.48550/arxiv.2111.09645,
146
  doi = {10.48550/ARXIV.2111.09645},
@@ -156,42 +117,4 @@ More information needed.
156
  publisher = {arXiv},
157
 
158
  year = {2021},
159
- ```
160
-
161
-
162
-
163
- **APA:**
164
-
165
- More information needed
166
-
167
- # Glossary [optional]
168
-
169
- More information needed
170
-
171
- # More Information [optional]
172
- More information needed
173
-
174
- # Model Card Authors [optional]
175
-
176
- Intel in collaboration with Ezi Ozoani and the Hugging Face team
177
-
178
- # Model Card Contact
179
-
180
- More information needed
181
-
182
- # How to Get Started with the Model
183
-
184
- Use the code below to get started with the model.
185
-
186
- <details>
187
- <summary> Click to expand </summary>
188
-
189
- ```python
190
- from transformers import AutoTokenizer, AutoModelForQuestionAnswering
191
-
192
- tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")
193
-
194
- model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")
195
- ```
196
- </details>
197
-
 
2
  tags:
3
  - question-answering
4
  - bert
5
+ license: apache-2.0
6
+ datasets:
7
+ - squad
8
+ language:
9
+ - en
10
+ model-index:
11
+ - name: dynamic-tinybert
12
+ results:
13
+ - task:
14
+ type: question-answering
15
+ name: question-answering
16
+ metrics:
17
+ - type: f1
18
+ value: 88.71
19
+
20
  ---
21
 
22
+ ## Model Details: Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length
23
+
24
+ Dynamic-TinyBERT has been fine-tuned for the NLP task of question answering, trained on the SQuAD 1.1 dataset. [Guskin et al. (2021)](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf) note:
25
+
26
+ > Dynamic-TinyBERT is a TinyBERT model that utilizes sequence-length reduction and Hyperparameter Optimization for enhanced inference efficiency per any computational budget. Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches (up to 3.3x with <1% loss-drop).
27
+
28
+
29
+
30
+ | Model Detail | Description |
31
+ | ----------- | ----------- |
32
+ | Model Authors - Company | Intel |
33
+ | Model Card Authors | Intel in collaboration with Hugging Face |
34
+ | Date | November 22, 2021 |
35
+ | Version | 1 |
36
+ | Type | NLP - Question Answering |
37
+ | Architecture | "For our Dynamic-TinyBERT model we use the architecture of TinyBERT6L: a small BERT model with 6 layers, a hidden size of 768, a feed forward size of 3072 and 12 heads." [Guskin et al. (2021)](https://gyuwankim.github.io/publication/dynamic-tinybert/poster.pdf) |
38
+ | Paper or Other Resources | Paper: [Guskin et al. (2021)](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf) and Poster: [Guskin et al. (2021)](https://gyuwankim.github.io/publication/dynamic-tinybert/poster.pdf) |
39
+ | License | Apache 2.0 |
40
+ | Questions or Comments | [Community Tab](https://huggingface.co/Intel/dynamic_tinybert/discussions) and [Intel Developers Discord](https://discord.gg/rv2Gp55UJQ)|
41
+
42
+ | Intended Use | Description |
43
+ | ----------- | ----------- |
44
+ | Primary intended uses | You can use the model for the NLP task of question answering: given a corpus of text, you can ask it a question about that text, and it will find the answer in the text. |
45
+ | Primary intended users | Anyone doing question answering |
46
+ | Out-of-scope uses | The model should not be used to intentionally create hostile or alienating environments for people.|
47
+
48
+ ### How to use
49
+
50
+ Here is how to import this model in Python:
51
+
52
+ <details>
53
+ <summary> Click to expand </summary>
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering
57
+
58
+ tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")
59
+
60
+ model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")
61
+ ```
62
+ </details>
63
+
64
+
65
+ | Factors | Description |
66
+ | ----------- | ----------- |
67
+ | Groups | Many Wikipedia articles with question and answer labels are contained in the training data |
68
+ | Instrumentation | - |
69
+ | Environment | Training was completed on a Titan GPU. |
70
+ | Card Prompts | Model deployment on alternate hardware and software will change model performance |
71
+
72
+ | Metrics | Description |
73
+ | ----------- | ----------- |
74
+ | Model performance measures | F1 |
75
+ | Decision thresholds | - |
76
+ | Approaches to uncertainty and variability | - |
77
+
78
+ | Training and Evaluation Data | Description |
79
+ | ----------- | ----------- |
80
+ | Datasets | SQuAD1.1: "Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable." (https://huggingface.co/datasets/squad)|
81
+ | Motivation | To build an efficient and accurate model for the question answering task. |
82
+ | Preprocessing | "We start with a pre-trained general-TinyBERT student, which was trained to learn the general knowledge of BERT using the general-distillation method presented by TinyBERT. We perform transformer distillation from a fine- tuned BERT teacher to the student, following the same training steps used in the original TinyBERT: (1) intermediate-layer distillation (ID) — learning the knowledge residing in the hidden states and attentions matrices, and (2) prediction-layer distillation (PD) — fitting the predictions of the teacher." ([Guskin et al., 2021](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf))|
83
+
84
+ Model Performance Analysis:
85
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  | Model | Max F1 (full model) | Best Speedup within BERT-1% |
87
  |------------------|---------------------|-----------------------------|
88
  | Dynamic-TinyBERT | 88.71 | 3.3x |
89
 
90
+ | Ethical Considerations | Description |
91
+ | ----------- | ----------- |
92
+ | Data | The training data come from Wikipedia articles |
93
+ | Human life | The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles. |
94
+ | Mitigations | No additional risk mitigation strategies were considered during model development. |
95
+ | Risks and harms | Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al., 2021](https://aclanthology.org/2021.acl-long.330.pdf), and [Bender et al., 2021](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.|
96
+ | Use cases | - |
97
+
98
 
99
+ | Caveats and Recommendations |
100
+ | ----------- |
101
+ | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model. |
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
+ ### BibTeX entry and citation info
105
  ```bibtex
106
  @misc{https://doi.org/10.48550/arxiv.2111.09645,
107
  doi = {10.48550/ARXIV.2111.09645},
 
117
  publisher = {arXiv},
118
 
119
  year = {2021},
120
+ ```