Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -179,9 +179,9 @@ Note: A small amount of English data was kept to avoid catastrophic forgetting.
 ## Training procedure
-The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used
-in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,257 tokens.
-After training a new tokenizer and adapting [falcon-7b](https://huggingface.co/tiiuae/falcon-7b)'s embedding layer, we continued its pre-training in three target languages: Catalan, Spanish, and English.
 The training lasted a total of 320 hours on 8 NVIDIA H100 GPUs with 80GB RAM.

 ## Training procedure
+The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) with a vocabulary size of 50,257 tokens.
+After training a new tokenizer and adapting [falcon-7b](https://huggingface.co/tiiuae/falcon-7b)'s embedding layer, the model was
+further pre-trained in three target languages: Catalan, Spanish and English.
 The training lasted a total of 320 hours on 8 NVIDIA H100 GPUs with 80GB RAM.