joanllop commited on
Commit
7926f83
1 Parent(s): b333324

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -125,6 +125,7 @@ Some of the statistics of the corpus:
125
  ### Training Procedure
126
  The configuration of the **RoBERTa-large-bne** model is as follows:
127
  - RoBERTa-l: 24-layer, 1024-hidden, 16-heads, 355M parameters.
 
128
  The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
129
  The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
130
  The RoBERTa-large-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 96 hours with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 
125
  ### Training Procedure
126
  The configuration of the **RoBERTa-large-bne** model is as follows:
127
  - RoBERTa-l: 24-layer, 1024-hidden, 16-heads, 355M parameters.
128
+
129
  The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
130
  The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
131
  The RoBERTa-large-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 96 hours with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.