gonzalez-agirre commited on
Commit
a94e933
1 Parent(s): 5110c34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -174,7 +174,7 @@ The dataset has the following language distribution:
174
 
175
  ## Training procedure
176
 
177
- The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,262 tokens. Once the model has been successfully initialized, we continued its pre-training in the three target languages: Catalan, Spanish, and English. We kept a small amount of English in order to avoid catastrophic forgetting. The training lasted a total of 96 hours with 8 NVIDIA H100 GPUs of 80GB of RAM.
178
 
179
 
180
  ### Training hyperparameters
 
174
 
175
  ## Training procedure
176
 
177
+ The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,257 tokens. Once the model has been successfully initialized, we continued its pre-training in the three target languages: Catalan, Spanish, and English. We kept a small amount of English in order to avoid catastrophic forgetting. The training lasted a total of 96 hours with 8 NVIDIA H100 GPUs of 80GB of RAM.
178
 
179
 
180
  ### Training hyperparameters