README.md · huawei-noah/DynaBERT

DynaBERT: Dynamic BERT with Adaptive Width and Depth

DynaBERT can flexibly adjust the size and latency by selecting adaptive width and depth, and the subnetworks of it have competitive performances as other similar-sized compressed models. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth using knowledge distillation.
This code is modified based on the repository developed by Hugging Face: Transformers v2.1.1, and is released in GitHub.

Reference

Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu. DynaBERT: Dynamic BERT with Adaptive Width and Depth.

@inproceedings{hou2020dynabert,
  title = {DynaBERT: Dynamic BERT with Adaptive Width and Depth},
  author = {Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu},  
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2020}
}