File size: 2,217 Bytes
dabf286
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
language: en
tags:
- roberta-base
- roberta-base-epoch_38
license: mit
datasets:
- wikipedia
- bookcorpus
---

# RoBERTa, Intermediate Checkpoint - Epoch 38

This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692), 
trained on Wikipedia and the Book Corpus only.
We train this model for almost 100K steps, corresponding to 83 epochs.
We provide the 84 checkpoints (including the randomly initialized weights before the training)
to provide the ability to study the training dynamics of such models, and other possible use-cases.

These models were trained in part of a work that studies how simple statistics from data,
such as co-occurrences affects model predictions, which are described in the paper
[Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251).

This is RoBERTa-base epoch_38.

## Model Description

This model was captured during a reproduction of
[RoBERTa-base](https://huggingface.co/roberta-base), for English: it
is a Transformers model pretrained on a large corpus of English data, using the
Masked Language Modelling (MLM).

The intended uses, limitations, training data and training procedure for the fully trained model are similar
to [RoBERTa-base](https://huggingface.co/roberta-base). Two major
differences with the original model:

*   We trained our model for 100K steps, instead of 500K
*   We only use Wikipedia and the Book Corpus, as corpora which are publicly available.


### How to use

Using code from
[RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on
PyTorch:

```
from transformers import pipeline

model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10)
model("Hello, I'm the <mask> RoBERTa-base language model")

```

## Citation info

```bibtex
@article{2207.14251,
Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg},
Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions},
Year = {2022},
Eprint = {arXiv:2207.14251},
}
```