tsellam commited on
Commit
6ca9633
1 Parent(s): 53d17ee

Created multiberts-seed_2

Browse files
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - multiberts
5
+ - multiberts-seed_2
6
+ license: apache-2.0
7
+ ---
8
+
9
+ # MultiBERTs - Seed 2
10
+
11
+ MultiBERTs is a collection of checkpoints and a statistical library to support
12
+ robust research on BERT. We provide 25 BERT-base models trained with
13
+ similar hyper-parameters as
14
+ [the original BERT model](https://github.com/google-research/bert) but
15
+ with different random seeds, which causes variations in the initial weights and order of
16
+ training instances. The aim is to distinguish findings that apply to a specific
17
+ artifact (i.e., a particular instance of the model) from those that apply to the
18
+ more general procedure.
19
+
20
+ We also provide 140 intermediate checkpoints captured
21
+ during the course of pre-training (we saved 28 checkpoints for the first 5 runs).
22
+
23
+ The models were originally released through
24
+ [http://goo.gle/multiberts](http://goo.gle/multiberts). We describe them in our
25
+ paper
26
+ [The MultiBERTs: BERT Reproductions for Robustness Analysis](https://arxiv.org/abs/2106.16163).
27
+
28
+ This is model #2.
29
+
30
+ ## Model Description
31
+
32
+ This model is a reproduction of
33
+ [BERT-base uncased](https://github.com/google-research/bert), for English: it
34
+ is a Transformers model pretrained on a large corpus of English data, using the
35
+ Masked Language Modelling (MLM) and the Next Sentence Prediction (NSP)
36
+ objectives.
37
+
38
+ The intended uses, limitations, training data and training procedure are similar
39
+ to [BERT-base uncased](https://github.com/google-research/bert). Two major
40
+ differences with the original model:
41
+
42
+ * We pre-trained the MultiBERTs models for 2 million steps using sequence
43
+ length 512 (instead of 1 million steps using sequence length 128 then 512).
44
+ * We used an alternative version of Wikipedia and Books Corpus, initially
45
+ collected for [Turc et al., 2019](https://arxiv.org/abs/1908.08962).
46
+
47
+ This is a best-effort reproduction, and so it is probable that differences with
48
+ the original model have gone unnoticed. The performance of MultiBERTs on GLUE is oftentimes comparable to that of original
49
+ BERT, but we found significant differences on the dev set of SQuAD (MultiBERTs outperforms original BERT).
50
+ See our [technical report](https://arxiv.org/abs/2106.16163) for more details.
51
+
52
+ ### How to use
53
+
54
+ Using code from
55
+ [BERT-base uncased](https://huggingface.co/bert-base-uncased), here is an example based on
56
+ Tensorflow:
57
+
58
+ ```
59
+ from transformers import BertTokenizer, TFBertModel
60
+ tokenizer = BertTokenizer.from_pretrained('google/multiberts-seed_2')
61
+ model = TFBertModel.from_pretrained("google/multiberts-seed_2")
62
+ text = "Replace me by any text you'd like."
63
+ encoded_input = tokenizer(text, return_tensors='tf')
64
+ output = model(encoded_input)
65
+ ```
66
+
67
+ PyTorch version:
68
+
69
+ ```
70
+ from transformers import BertTokenizer, BertModel
71
+ tokenizer = BertTokenizer.from_pretrained('google/multiberts-seed_2')
72
+ model = BertModel.from_pretrained("google/multiberts-seed_2")
73
+ text = "Replace me by any text you'd like."
74
+ encoded_input = tokenizer(text, return_tensors='pt')
75
+ output = model(**encoded_input)
76
+ ```
77
+
78
+ ## Citation info
79
+
80
+ ```bibtex
81
+ @article{sellam2021multiberts,
82
+ title={The MultiBERTs: BERT Reproductions for Robustness Analysis},
83
+ author={Thibault Sellam and Steve Yadlowsky and Jason Wei and Naomi Saphra and Alexander D'Amour and Tal Linzen and Jasmijn Bastings and Iulia Turc and Jacob Eisenstein and Dipanjan Das and Ian Tenney and Ellie Pavlick},
84
+ journal={arXiv preprint arXiv:2106.16163},
85
+ year={2021}
86
+ }
87
+ ```
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "tmp-model",
3
+ "architectures": [
4
+ "BertForPreTraining"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.12.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:538597638730e8db2963e645cbe60b8d11147a12e258a16d22eed4743adc2419
3
+ size 440509027
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b3c6a39886855dfc1ced08e78cf45e9ae892d7cf81f8bec6949e3b2b8357449
3
+ size 536063536
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"do_lower_case": true, "do_basic_tokenize": true, "never_split": null, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "tokenizer_class": "BertTokenizer"}
vocab.txt ADDED
The diff for this file is too large to render. See raw diff