tsellam commited on
Commit
35dd952
1 Parent(s): 1619189

Created multiberts-seed_1-step_180k

Browse files
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - multiberts
5
+ - multiberts-seed_1
6
+ - multiberts-seed_1-step_180k
7
+ license: apache-2.0
8
+ ---
9
+
10
+ # MultiBERTs, Intermediate Checkpoint - Seed 1, Step 180k
11
+
12
+ MultiBERTs is a collection of checkpoints and a statistical library to support
13
+ robust research on BERT. We provide 25 BERT-base models trained with
14
+ similar hyper-parameters as
15
+ [the original BERT model](https://github.com/google-research/bert) but
16
+ with different random seeds, which causes variations in the initial weights and order of
17
+ training instances. The aim is to distinguish findings that apply to a specific
18
+ artifact (i.e., a particular instance of the model) from those that apply to the
19
+ more general procedure.
20
+
21
+ We also provide 140 intermediate checkpoints captured
22
+ during the course of pre-training (we saved 28 checkpoints for the first 5 runs).
23
+
24
+ The models were originally released through
25
+ [http://goo.gle/multiberts](http://goo.gle/multiberts). We describe them in our
26
+ paper
27
+ [The MultiBERTs: BERT Reproductions for Robustness Analysis](https://arxiv.org/abs/2106.16163).
28
+
29
+ This is model #1, captured at step 180k (max: 2000k, i.e., 2M steps).
30
+
31
+ ## Model Description
32
+
33
+ This model was captured during a reproduction of
34
+ [BERT-base uncased](https://github.com/google-research/bert), for English: it
35
+ is a Transformers model pretrained on a large corpus of English data, using the
36
+ Masked Language Modelling (MLM) and the Next Sentence Prediction (NSP)
37
+ objectives.
38
+
39
+ The intended uses, limitations, training data and training procedure for the fully trained model are similar
40
+ to [BERT-base uncased](https://github.com/google-research/bert). Two major
41
+ differences with the original model:
42
+
43
+ * We pre-trained the MultiBERTs models for 2 million steps using sequence
44
+ length 512 (instead of 1 million steps using sequence length 128 then 512).
45
+ * We used an alternative version of Wikipedia and Books Corpus, initially
46
+ collected for [Turc et al., 2019](https://arxiv.org/abs/1908.08962).
47
+
48
+ This is a best-effort reproduction, and so it is probable that differences with
49
+ the original model have gone unnoticed. The performance of MultiBERTs on GLUE after full training is oftentimes comparable to that of original
50
+ BERT, but we found significant differences on the dev set of SQuAD (MultiBERTs outperforms original BERT).
51
+ See our [technical report](https://arxiv.org/abs/2106.16163) for more details.
52
+
53
+ ### How to use
54
+
55
+ Using code from
56
+ [BERT-base uncased](https://huggingface.co/bert-base-uncased), here is an example based on
57
+ Tensorflow:
58
+
59
+ ```
60
+ from transformers import BertTokenizer, TFBertModel
61
+ tokenizer = BertTokenizer.from_pretrained('google/multiberts-seed_1-step_180k')
62
+ model = TFBertModel.from_pretrained("google/multiberts-seed_1-step_180k")
63
+ text = "Replace me by any text you'd like."
64
+ encoded_input = tokenizer(text, return_tensors='tf')
65
+ output = model(encoded_input)
66
+ ```
67
+
68
+ PyTorch version:
69
+
70
+ ```
71
+ from transformers import BertTokenizer, BertModel
72
+ tokenizer = BertTokenizer.from_pretrained('google/multiberts-seed_1-step_180k')
73
+ model = BertModel.from_pretrained("google/multiberts-seed_1-step_180k")
74
+ text = "Replace me by any text you'd like."
75
+ encoded_input = tokenizer(text, return_tensors='pt')
76
+ output = model(**encoded_input)
77
+ ```
78
+
79
+ ## Citation info
80
+
81
+ ```bibtex
82
+ @article{sellam2021multiberts,
83
+ title={The MultiBERTs: BERT Reproductions for Robustness Analysis},
84
+ author={Thibault Sellam and Steve Yadlowsky and Jason Wei and Naomi Saphra and Alexander D'Amour and Tal Linzen and Jasmijn Bastings and Iulia Turc and Jacob Eisenstein and Dipanjan Das and Ian Tenney and Ellie Pavlick},
85
+ journal={arXiv preprint arXiv:2106.16163},
86
+ year={2021}
87
+ }
88
+ ```
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "tmp-model",
3
+ "architectures": [
4
+ "BertForPreTraining"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.12.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d37031b778fdcf8bc750288351d079b6eaafe9b8b34ab78dfdfdae103b32accc
3
+ size 440509027
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2de9bb403c874bd9aff56b4806c82460a87b6b91fbc6aa2466d4e49829f0c737
3
+ size 536063536
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "do_basic_tokenize": true, "never_split": null, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "tokenizer_class": "BertTokenizer"}
vocab.txt ADDED
The diff for this file is too large to render. See raw diff