ArthurZ HF staff commited on
Commit
3159c78
1 Parent(s): e71e5ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -88,10 +88,10 @@ Roller et al. (2021)
88
  - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News
89
  dataset that was used in RoBERTa (Liu et al., 2019b)
90
 
91
- * The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
92
  to each dataset’s size in the pretraining corpus.
93
 
94
- * The dataset might contains offensive content as parts of the dataset are a subset of
95
  public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
96
  that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
97
 
 
88
  - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News
89
  dataset that was used in RoBERTa (Liu et al., 2019b)
90
 
91
+ The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
92
  to each dataset’s size in the pretraining corpus.
93
 
94
+ The dataset might contains offensive content as parts of the dataset are a subset of
95
  public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
96
  that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
97