Pretrained models¶
Here is the full list of the currently provided pretrained models together with a short presentation of each model.
+===============+============================================================+===========================+
| Architecture | Shortcut name | Details of the model |
+===============+============================================================+===========================+
| | bert-base-uncased
| 12-layer, 768-hidden, 12-heads, 110M parameters
| | | Trained on lower-cased English text |
| +————————————————————+—————————+
| | bert-large-uncased
| 24-layer, 1024-hidden, 16-heads, 340M parameters
| | | Trained on lower-cased English text |
| +————————————————————+—————————+
| | bert-base-cased
| 12-layer, 768-hidden, 12-heads, 110M parameters
| | | Trained on cased English text |
| +————————————————————+—————————+
| | bert-large-cased
| 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | Trained on cased English text |
| +————————————————————+—————————+
| | bert-base-multilingual-uncased
| (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters
| | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias
| | | (see details) |
| +————————————————————+—————————+
| | bert-base-multilingual-cased
| (New, recommended) 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | Trained on cased text in the top 104 languages with the largest Wikipedias
| | | (see details) |
| +————————————————————+—————————+
| BERT | bert-base-chinese
| 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | Trained on cased Chinese Simplified and Traditional text |
| +————————————————————+—————————+
| | bert-base-german-cased
| 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | Trained on cased German text by Deepset.ai |
| | | (see details on deepset.ai website) |
| +————————————————————+—————————+
| | bert-large-uncased-whole-word-masking
| 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | Trained on lower-cased English text using Whole-Word-Masking |
| | | (see details) |
| +————————————————————+—————————+
| | bert-large-cased-whole-word-masking
| 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | Trained on cased English text using Whole-Word-Masking |
| | | (see details) |
| +————————————————————+—————————+
| | bert-large-uncased-whole-word-masking-finetuned-squad
| 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | The bert-large-uncased-whole-word-masking
model fine-tuned on SQuAD |
| | | (see details of fine-tuning in the `example section`_) |
| +————————————————————+—————————+
| | bert-large-cased-whole-word-masking-finetuned-squad
| 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | The bert-large-cased-whole-word-masking
model fine-tuned on SQuAD |
| | | (see details of fine-tuning in the example section) |
| +————————————————————+—————————+
| | bert-base-cased-finetuned-mrpc
| 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | The bert-base-cased
model fine-tuned on MRPC |
| | | (see details of fine-tuning in the example section) |
+—————+————————————————————+—————————+
| GPT | Cells may span columns. |
+—————+—————————————————————————————-+