license: other | |
Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g., the hidden dimension is a quarter of the intermediate dimension in feed-forward sub-networks) in BERT. In this paper, we adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters for efficient pre-trained language models (at least 6x faster than BERT-base). | |
AutoTinyBERT provides a model zoo that can meet different latency requirements. |