Pretrained models

Here is the full list of the currently provided pretrained models together with a short presentation of each model.

+===============+============================================================+===========================+ | Architecture | Shortcut name | Details of the model | +===============+============================================================+===========================+ | | bert-base-uncased | 12-layer, 768-hidden, 12-heads, 110M parameters | | | Trained on lower-cased English text | | +————————————————————+—————————+ | | bert-large-uncased | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | Trained on lower-cased English text | | +————————————————————+—————————+ | | bert-base-cased | 12-layer, 768-hidden, 12-heads, 110M parameters | | | Trained on cased English text | | +————————————————————+—————————+ | | bert-large-cased | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | | Trained on cased English text | | +————————————————————+—————————+ | | bert-base-multilingual-uncased | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias | | | (see details) | | +————————————————————+—————————+ | | bert-base-multilingual-cased | (New, recommended) 12-layer, 768-hidden, 12-heads, 110M parameters | | | | Trained on cased text in the top 104 languages with the largest Wikipedias | | | (see details) | | +————————————————————+—————————+ | BERT | bert-base-chinese | 12-layer, 768-hidden, 12-heads, 110M parameters | | | | Trained on cased Chinese Simplified and Traditional text | | +————————————————————+—————————+ | | bert-base-german-cased | 12-layer, 768-hidden, 12-heads, 110M parameters | | | | Trained on cased German text by Deepset.ai | | | | (see details on deepset.ai website) | | +————————————————————+—————————+ | | bert-large-uncased-whole-word-masking | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | | Trained on lower-cased English text using Whole-Word-Masking | | | | (see details) | | +————————————————————+—————————+ | | bert-large-cased-whole-word-masking | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | | Trained on cased English text using Whole-Word-Masking | | | | (see details) | | +————————————————————+—————————+ | | bert-large-uncased-whole-word-masking-finetuned-squad | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | | The bert-large-uncased-whole-word-masking model fine-tuned on SQuAD | | | | (see details of fine-tuning in the `example section`_) | | +————————————————————+—————————+ | | bert-large-cased-whole-word-masking-finetuned-squad | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | | The bert-large-cased-whole-word-masking model fine-tuned on SQuAD | | | | (see details of fine-tuning in the example section) | | +————————————————————+—————————+ | | bert-base-cased-finetuned-mrpc | 12-layer, 768-hidden, 12-heads, 110M parameters | | | | The bert-base-cased model fine-tuned on MRPC | | | | (see details of fine-tuning in the example section) | +—————+————————————————————+—————————+ | GPT | Cells may span columns. | +—————+—————————————————————————————-+