|
--- |
|
language: en |
|
tags: |
|
- roberta-base |
|
- roberta-base-epoch_72 |
|
license: mit |
|
datasets: |
|
- wikipedia |
|
- bookcorpus |
|
--- |
|
|
|
# RoBERTa, Intermediate Checkpoint - Epoch 72 |
|
|
|
This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692), |
|
trained on Wikipedia and the Book Corpus only. |
|
We train this model for almost 100K steps, corresponding to 83 epochs. |
|
We provide the 84 checkpoints (including the randomly initialized weights before the training) |
|
to provide the ability to study the training dynamics of such models, and other possible use-cases. |
|
|
|
These models were trained in part of a work that studies how simple statistics from data, |
|
such as co-occurrences affects model predictions, which are described in the paper |
|
[Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251). |
|
|
|
This is RoBERTa-base epoch_72. |
|
|
|
## Model Description |
|
|
|
This model was captured during a reproduction of |
|
[RoBERTa-base](https://huggingface.co/roberta-base), for English: it |
|
is a Transformers model pretrained on a large corpus of English data, using the |
|
Masked Language Modelling (MLM). |
|
|
|
The intended uses, limitations, training data and training procedure for the fully trained model are similar |
|
to [RoBERTa-base](https://huggingface.co/roberta-base). Two major |
|
differences with the original model: |
|
|
|
* We trained our model for 100K steps, instead of 500K |
|
* We only use Wikipedia and the Book Corpus, as corpora which are publicly available. |
|
|
|
|
|
### How to use |
|
|
|
Using code from |
|
[RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on |
|
PyTorch: |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10) |
|
model("Hello, I'm the <mask> RoBERTa-base language model") |
|
|
|
``` |
|
|
|
## Citation info |
|
|
|
```bibtex |
|
@article{2207.14251, |
|
Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg}, |
|
Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions}, |
|
Year = {2022}, |
|
Eprint = {arXiv:2207.14251}, |
|
} |
|
``` |
|
|