yanaiela commited on
Commit
dd858e6
1 Parent(s): 91b8351

readme file

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - roberta-base
5
+ - roberta-base-epoch_83
6
+ license: mit
7
+ datasets:
8
+ - wikipedia
9
+ - bookcorpus
10
+ ---
11
+
12
+ # RoBERTa, Intermediate Checkpoint - Epoch 83
13
+
14
+ This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692),
15
+ trained on Wikipedia and the Book Corpus only.
16
+ We train this model for almost 100K steps, corresponding to 83 epochs.
17
+ We provide the 84 checkpoints (including the randomly initialized weights before the training)
18
+ to provide the ability to study the training dynamics of such models, and other possible use-cases.
19
+
20
+ These models were trained in part of a work that studies how simple statistics from data,
21
+ such as co-occurrences affects model predictions, which are described in the paper
22
+ [Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251).
23
+
24
+ This is RoBERTa-base epoch_83.
25
+
26
+ ## Model Description
27
+
28
+ This model was captured during a reproduction of
29
+ [RoBERTa-base](https://huggingface.co/roberta-base), for English: it
30
+ is a Transformers model pretrained on a large corpus of English data, using the
31
+ Masked Language Modelling (MLM).
32
+
33
+ The intended uses, limitations, training data and training procedure for the fully trained model are similar
34
+ to [RoBERTa-base](https://huggingface.co/roberta-base). Two major
35
+ differences with the original model:
36
+
37
+ * We trained our model for 100K steps, instead of 500K
38
+ * We only use Wikipedia and the Book Corpus, as corpora which are publicly available.
39
+
40
+
41
+ ### How to use
42
+
43
+ Using code from
44
+ [RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on
45
+ PyTorch:
46
+
47
+ ```
48
+ from transformers import pipeline
49
+
50
+ model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10)
51
+ model("Hello, I'm the <mask> RoBERTa-base language model")
52
+
53
+ ```
54
+
55
+ ## Citation info
56
+
57
+ ```bibtex
58
+ @article{2207.14251,
59
+ Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg},
60
+ Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions},
61
+ Year = {2022},
62
+ Eprint = {arXiv:2207.14251},
63
+ }
64
+ ```