multilingual
sea
nxphi47 commited on
Commit
0584721
1 Parent(s): 9fd721e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -84,7 +84,7 @@ We conduct pre-training in 4 different stages. Each stage serves a different spe
84
 
85
  As our goal is for Llama-2 to learn new languages with the least number of tokens and computing resources, we control an appropriate data mix of new (Vi, Id & Th) and old (En, Zh) languages so that the new vocabulary and knowledge are trained quickly, while relatively maintaining the performance of the original Llama-2 model and establishing a knowledge bridge between new and existing languages.
86
 
87
- We pre-train our SeaLLM-base in ~4 weeks on 32gpus, clocking ~150B tokens.
88
 
89
  ## Supervised Finetuning (SFT)
90
 
 
84
 
85
  As our goal is for Llama-2 to learn new languages with the least number of tokens and computing resources, we control an appropriate data mix of new (Vi, Id & Th) and old (En, Zh) languages so that the new vocabulary and knowledge are trained quickly, while relatively maintaining the performance of the original Llama-2 model and establishing a knowledge bridge between new and existing languages.
86
 
87
+ We pre-train our SeaLLM-base in ~4 weeks on 32gpus, clocking ~150B tokens. We use [Flash-attention-V2](https://github.com/Dao-AILab/flash-attention) as well as fusing many operations to achieve greater training throughput.
88
 
89
  ## Supervised Finetuning (SFT)
90