lysandre HF staff commited on
Commit
9c2b7df
1 Parent(s): e0c83df

Add whole word masking information

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -13,6 +13,10 @@ Pretrained model on English language using a masked language modeling (MLM) obje
13
  [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
14
  between english and English.
15
 
 
 
 
 
16
  Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
17
  the Hugging Face team.
18
 
@@ -194,11 +198,9 @@ learning rate warmup for 10,000 steps and linear decay of the learning rate afte
194
 
195
  When fine-tuned on downstream tasks, this model achieves the following results:
196
 
197
- Glue test results:
198
-
199
- | Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
200
- |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
201
- | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 |
202
 
203
 
204
  ### BibTeX entry and citation info
13
  [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
14
  between english and English.
15
 
16
+ Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same.
17
+
18
+ The training is identical -- each masked WordPiece token is predicted independently.
19
+
20
  Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
21
  the Hugging Face team.
22
 
198
 
199
  When fine-tuned on downstream tasks, this model achieves the following results:
200
 
201
+ Model | SQUAD 1.1 F1/EM | Multi NLI Accuracy
202
+ ---------------------------------------- | :-------------: | :----------------:
203
+ BERT-Large, Uncased (Whole Word Masking) | 92.8/86.7 | 87.07
 
 
204
 
205
 
206
  ### BibTeX entry and citation info