sguskin nazneen commited on
Commit
9946b1d
1 Parent(s): a091849

model documentation (#1)

Browse files

- model documentation (bece709ea28e0b60e6a80b1c2f57cee6cbdebe6f)


Co-authored-by: Nazneen Rajani <nazneen@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +197 -0
README.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - question-answering
4
+ - bert
5
+ ---
6
+
7
+ # Model Card for dynamic_tinybert
8
+
9
+ # Model Details
10
+
11
+ ## Model Description
12
+
13
+ Dynamic-TinyBERT: Boost TinyBERT’s Inference Efficiency by Dynamic Sequence Length
14
+
15
+
16
+ - **Developed by:** Intel
17
+ - **Shared by [Optional]:** Intel
18
+ - **Model type:** Question Answering
19
+ - **Language(s) (NLP):** More information needed
20
+ - **License:** More information needed
21
+ - **Parent Model:** BERT
22
+ - **Resources for more information:**
23
+ - [Associated Paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf)
24
+
25
+
26
+
27
+
28
+ # Uses
29
+
30
+
31
+ ## Direct Use
32
+ This model can be used for the task of question answering.
33
+
34
+ ## Downstream Use [Optional]
35
+
36
+ More information needed.
37
+
38
+ ## Out-of-Scope Use
39
+
40
+ The model should not be used to intentionally create hostile or alienating environments for people.
41
+
42
+ # Bias, Risks, and Limitations
43
+
44
+
45
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
46
+
47
+
48
+
49
+ ## Recommendations
50
+
51
+
52
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
53
+
54
+ # Training Details
55
+
56
+ ## Training Data
57
+
58
+ The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
59
+ > All our experiments are evaluated on the challenging question-answering benchmark SQuAD1.1 [11].
60
+
61
+
62
+ ## Training Procedure
63
+
64
+
65
+ ### Preprocessing
66
+
67
+ The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
68
+ > We start with a pre-trained general-TinyBERT student, which was trained to learn the general knowledge of BERT using the general-distillation method presented by TinyBERT. We perform transformer distillation from a fine- tuned BERT teacher to the student, following the same training steps used in the original TinyBERT: (1) **intermediate-layer distillation (ID)** — learning the knowledge residing in the hidden states and attentions matrices, and (2) **prediction-layer distillation (PD)** — fitting the predictions of the teacher.
69
+
70
+
71
+
72
+
73
+ ### Speeds, Sizes, Times
74
+
75
+ The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
76
+ >For our Dynamic-TinyBERT model we use the architecture of TinyBERT6L: a small BERT model with 6 layers, a hidden size of 768, a feed forward size of 3072 and 12 heads.
77
+
78
+ # Evaluation
79
+
80
+
81
+ ## Testing Data, Factors & Metrics
82
+
83
+ ### Testing Data
84
+
85
+ More information needed
86
+
87
+ ### Factors
88
+ More information needed
89
+
90
+ ### Metrics
91
+
92
+ More information needed
93
+
94
+
95
+ ## Results
96
+
97
+ The model authors note in the [associated paper](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf):
98
+
99
+ | Model | Max F1 (full model) | Best Speedup within BERT-1% |
100
+ |------------------|---------------------|-----------------------------|
101
+ | Dynamic-TinyBERT | 88.71 | 3.3x |
102
+
103
+
104
+
105
+
106
+ # Model Examination
107
+
108
+ More information needed
109
+
110
+ # Environmental Impact
111
+
112
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
113
+
114
+ - **Hardware Type:** Titan GPU
115
+ - **Hours used:** More information needed
116
+ - **Cloud Provider:** More information needed
117
+ - **Compute Region:** More information needed
118
+ - **Carbon Emitted:** More information needed
119
+
120
+ # Technical Specifications [optional]
121
+
122
+ ## Model Architecture and Objective
123
+
124
+ More information needed
125
+
126
+ ## Compute Infrastructure
127
+
128
+ More information needed
129
+
130
+ ### Hardware
131
+
132
+
133
+ More information needed
134
+
135
+ ### Software
136
+
137
+ More information needed.
138
+
139
+ # Citation
140
+
141
+
142
+ **BibTeX:**
143
+
144
+ ```bibtex
145
+ @misc{https://doi.org/10.48550/arxiv.2111.09645,
146
+ doi = {10.48550/ARXIV.2111.09645},
147
+
148
+ url = {https://arxiv.org/abs/2111.09645},
149
+
150
+ author = {Guskin, Shira and Wasserblat, Moshe and Ding, Ke and Kim, Gyuwan},
151
+
152
+ keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
153
+
154
+ title = {Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length},
155
+
156
+ publisher = {arXiv},
157
+
158
+ year = {2021},
159
+ ```
160
+
161
+
162
+
163
+ **APA:**
164
+
165
+ More information needed
166
+
167
+ # Glossary [optional]
168
+
169
+ More information needed
170
+
171
+ # More Information [optional]
172
+ More information needed
173
+
174
+ # Model Card Authors [optional]
175
+
176
+ Intel in collaboration with Ezi Ozoani and the Hugging Face team
177
+
178
+ # Model Card Contact
179
+
180
+ More information needed
181
+
182
+ # How to Get Started with the Model
183
+
184
+ Use the code below to get started with the model.
185
+
186
+ <details>
187
+ <summary> Click to expand </summary>
188
+
189
+ ```python
190
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering
191
+
192
+ tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert")
193
+
194
+ model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")
195
+ ```
196
+ </details>
197
+