add model variations table and ToC

#8
by buio - opened
Files changed (1) hide show
  1. README.md +33 -2
README.md CHANGED
@@ -10,6 +10,17 @@ datasets:
10
 
11
  # BERT base model (uncased)
12
 
 
 
 
 
 
 
 
 
 
 
 
13
  Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
14
  [this paper](https://arxiv.org/abs/1810.04805) and first released in
15
  [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
@@ -18,7 +29,7 @@ between english and English.
18
  Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
19
  the Hugging Face team.
20
 
21
- ## Model description
22
 
23
  BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
24
  was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
@@ -38,7 +49,27 @@ This way, the model learns an inner representation of the English language that
38
  useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
39
  classifier using the features produced by the BERT model as inputs.
40
 
41
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
44
  be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
 
10
 
11
  # BERT base model (uncased)
12
 
13
+ ## Table of Contents
14
+ - [Model description](#model-description)
15
+ - [Model variations](#model-variations)
16
+ - [Intended uses and limitations](#intended-uses-and-limitations)
17
+ - [How to use](#how-to-use)
18
+ - [Limitations and bias](#limitations-and-bias)
19
+ - [Training data](#training-data)
20
+ - [Evaluation results](#evaluation-results)
21
+ - [BibTeX entry and citation info](#bibtex-entry-and-citation-info)
22
+
23
+
24
  Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
25
  [this paper](https://arxiv.org/abs/1810.04805) and first released in
26
  [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
 
29
  Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
30
  the Hugging Face team.
31
 
32
+ ## Model Description
33
 
34
  BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
35
  was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
 
49
  useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
50
  classifier using the features produced by the BERT model as inputs.
51
 
52
+ ## Model variations
53
+
54
+ BERT has originally been released in base and large variations, for cased and uncased input text. The uncased models also strips out an accent markers.
55
+ Chinese and multilingual uncased and cased versions followed shortly after.
56
+ Modified preprocessing with whole word masking has replaced subpiece masking in a following work, with the release of two models.
57
+ Other 24 smaller models are released aftwrwards.
58
+
59
+ The detailed release history can be found on the [google-research/bert readme](https://github.com/google-research/bert/blob/master/README.md) on github.
60
+
61
+ | Model | #params | Language |
62
+ |------------------------|--------------------------------|-------|
63
+ | [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M | English |
64
+ | [`bert-large-uncased`](https://huggingface.co/bert-large-uncased) | 340M | English | sub word
65
+ | [`bert-base-cased`](https://huggingface.co/bert-base-cased) | 110M | English |
66
+ | [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M | English |
67
+ | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese) | 110M | Chinese |
68
+ | [`bert-base-multilingual-cased`](https://huggingface.co/bert-base-multilingual-cased) | 110M | Multiple |
69
+ | [`bert-large-uncased-whole-word-masking`](https://huggingface.co/bert-large-uncased-whole-word-masking) | 340M | English |
70
+ | [`bert-large-cased-whole-word-masking`](https://huggingface.co/bert-large-cased-whole-word-masking) | 340M | English |
71
+
72
+ ## Intended uses and limitations
73
 
74
  You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
75
  be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for