|
## DynaBERT: Dynamic BERT with Adaptive Width and Depth |
|
|
|
* DynaBERT can flexibly adjust the size and latency by selecting adaptive width and depth, and |
|
the subnetworks of it have competitive performances as other similar-sized compressed models. |
|
The training process of DynaBERT includes first training a width-adaptive BERT and then |
|
allowing both adaptive width and depth using knowledge distillation. |
|
|
|
* This code is modified based on the repository developed by Hugging Face: [Transformers v2.1.1](https://github.com/huggingface/transformers/tree/v2.1.1), and is released in [GitHub](https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT). |
|
|
|
### Reference |
|
Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu. |
|
[DynaBERT: Dynamic BERT with Adaptive Width and Depth](https://arxiv.org/abs/2004.04037). |
|
``` |
|
@inproceedings{hou2020dynabert, |
|
title = {DynaBERT: Dynamic BERT with Adaptive Width and Depth}, |
|
author = {Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu}, |
|
booktitle = {Advances in Neural Information Processing Systems}, |
|
year = {2020} |
|
} |
|
``` |
|
|