joanllop commited on
Commit
7ba0ae6
1 Parent(s): faa0669

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -109,8 +109,7 @@ torch.Size([1, 19, 768])
109
 
110
  You can use the raw model for fill mask or fine-tune it to a downstream task.
111
 
112
- The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of
113
- unfiltered content from the internet, which is far from neutral. Here's an example of how the model can have biased predictions:
114
 
115
  ```python
116
  >>> from transformers import pipeline, set_seed
@@ -181,6 +180,7 @@ Some of the statistics of the corpus:
181
  ### Training Procedure
182
  The configuration of the **RoBERTa-base-bne** model is as follows:
183
  - RoBERTa-b: 12-layer, 768-hidden, 12-heads, 125M parameters.
 
184
  The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
185
  The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
186
  The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 48 hours with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 
109
 
110
  You can use the raw model for fill mask or fine-tune it to a downstream task.
111
 
112
+ The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
 
113
 
114
  ```python
115
  >>> from transformers import pipeline, set_seed
 
180
  ### Training Procedure
181
  The configuration of the **RoBERTa-base-bne** model is as follows:
182
  - RoBERTa-b: 12-layer, 768-hidden, 12-heads, 125M parameters.
183
+
184
  The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
185
  The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
186
  The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 48 hours with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.