Model Details: QuaLA-MiniLM

This model is the result of a new approach called QuaLA-MiniLM, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an x8.8 speedup with less than 1% accuracy loss. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.

QuaLA-MiniLM training process

To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.

Model Detail	Description
language:	en
Model Authors Company	Intel
Date	May 4, 2023
Version	1
Type	NLP - Tiny language model
Architecture	"In this work we expand Dynamic-TinyBERT to generate a much more highly efficient model. First, we use a much smaller MiniLM model which was distilled from a RoBERTa-Large teacher rather than BERT-base. Second, we apply the LAT method to make the model length-adaptive, and finally we further enhance the model’s efficiency by applying 8-bit quantization. The resultant QuaLAMiniLM (Quantized Length-Adaptive MiniLM) model outperforms BERT-base with only 30% of parameters, and demonstrates an accuracy-speedup tradeoff that is superior to any other efficiency approach (up to x8.8 speedup with <1% accuracy loss) on the challenging SQuAD1.1 benchmark. Following the concept presented by LAT, it provides a wide range of accuracy-efficiency tradeoff points while alleviating the need to retrain it for each point along the accuracy-efficiency curve."
Paper or Other Resources	https://arxiv.org/pdf/2210.17114.pdf
License	TBD
Questions or Comments	Intel DevHub Discord

Intended Use	Description
Primary intended uses	TBD
Primary intended users	Anyone who needs an efficient tiny language model for other downstream tasks.
Out-of-scope uses	The model should not be used to intentionally create hostile or alienating environments for people.

How to use

Code examples coming soon!

import ...

Metrics (Model Performance):

Inference performance on the SQuAD1.1 evaluation dataset. For all the length-adaptive (LA) models we show the performance both of running the model without token-dropping, and of running the model in a token-dropping configuration according to the optimal length configuration found to meet our accuracy constraint.

Model	Model size (Mb)	Tokens per layer	Accuracy (F1)	Latency (ms)	FLOPs	Speedup
BERT-base	415.4723	(384,384,384,384,384,384)	88.5831	56.5679	3.53E+10	1x
TinyBERT-ours	253.2077	(384,384,384,384,384,384)	88.3959	32.4038	1.77E+10	1.74x
QuaTinyBERT-ours	132.0665	(384,384,384,384,384,384)	87.6755	15.5850 1.77E+10	3.63x
MiniLMv2-ours	115.0473	(384,384,384,384,384,384)	88.7016	18.2312	4.76E+09	3.10x
QuaMiniLMv2-ours	84.8602	(384,384,384,384,384,384)	88.5463	9.1466	4.76E+09	6.18x
LA-MiniLM	115.0473	(384,384,384,384,384,384)	89.2811	16.9900	4.76E+09	3.33x
LA-MiniLM	115.0473	(269, 253, 252, 202, 104, 34)	87.7637	11.4428	2.49E+09	4.94x
QuaLA-MiniLM	84.8596	(384,384,384,384,384,384)	88.8593	7.4443	4.76E+09	7.6x
QuaLA-MiniLM	84.8596	(315,251,242,159,142,33)	87.6828	6.4146	2.547E+09	8.8x

Training and Evaluation Data

Training and Evaluation Data	Description
Datasets	SQuAD1.1 dataset
Motivation	To build an efficient and accurate base model for several downstream language tasks.

Ethical Considerations

Ethical Considerations	Description
Data	SQuAD1.1 dataset
Human life	The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles.
Mitigations	No additional risk mitigation strategies were considered during model development.
Risks and harms	Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al., 2021, and Bender et al., 2021). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model.

BibTeX entry and citation info

comments	description
comments:	In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.17114 [cs.CL]
-	(or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114

Intel
/

dynamic-minilmv2-L6-H384-squad1.1-int8-static