projecte-aina
/

FLOR-6.3B

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

javier-ab-bsc commited on Mar 19

Commit

ed68404

•

1 Parent(s): 1117366

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -149,7 +149,7 @@ to be adapted before continuing its pre-training with data in the target languag
 1) We trained our own BPE tokenizer for Catalan, Spanish, and English, and replaced the original BLOOM tokenizer and vocabulary with it. This procedure implied a downsizing of the original BLOOM's embedding layer and, therefore, a model compression from 7.1B parameters to 6.3B.
 2) The embeddings corresponding to tokens that are present in both the original and the target vocabulary (matching tokens) were used for initialization.
 3) The embeddings from tokens not present in BLOOM's original vocabulary were initialized as the average of all embeddings.
-4) The model was initialized with the weights from BOOM-7.1B, and with our adapted tokenizer (step 1) and embeddings (steps 2-3).
 5) The model was then trained on a corpus that contains a mixture of Catalan, Spanish, and English data.
 ### Training data

 1) We trained our own BPE tokenizer for Catalan, Spanish, and English, and replaced the original BLOOM tokenizer and vocabulary with it. This procedure implied a downsizing of the original BLOOM's embedding layer and, therefore, a model compression from 7.1B parameters to 6.3B.
 2) The embeddings corresponding to tokens that are present in both the original and the target vocabulary (matching tokens) were used for initialization.
 3) The embeddings from tokens not present in BLOOM's original vocabulary were initialized as the average of all embeddings.
+4) The model was initialized with the weights from BLOOM-7.1B, and with our adapted tokenizer (step 1) and embeddings (steps 2-3).
 5) The model was then trained on a corpus that contains a mixture of Catalan, Spanish, and English data.
 ### Training data