Update README.md
Browse files
README.md
CHANGED
@@ -84,7 +84,7 @@ We conduct pre-training in 4 different stages. Each stage serves a different spe
|
|
84 |
|
85 |
As our goal is for Llama-2 to learn new languages with the least number of tokens and computing resources, we control an appropriate data mix of new (Vi, Id & Th) and old (En, Zh) languages so that the new vocabulary and knowledge are trained quickly, while relatively maintaining the performance of the original Llama-2 model and establishing a knowledge bridge between new and existing languages.
|
86 |
|
87 |
-
We pre-train our SeaLLM-base in ~4 weeks on 32gpus, clocking ~150B tokens.
|
88 |
|
89 |
## Supervised Finetuning (SFT)
|
90 |
|
|
|
84 |
|
85 |
As our goal is for Llama-2 to learn new languages with the least number of tokens and computing resources, we control an appropriate data mix of new (Vi, Id & Th) and old (En, Zh) languages so that the new vocabulary and knowledge are trained quickly, while relatively maintaining the performance of the original Llama-2 model and establishing a knowledge bridge between new and existing languages.
|
86 |
|
87 |
+
We pre-train our SeaLLM-base in ~4 weeks on 32gpus, clocking ~150B tokens. We use [Flash-attention-V2](https://github.com/Dao-AILab/flash-attention) as well as fusing many operations to achieve greater training throughput.
|
88 |
|
89 |
## Supervised Finetuning (SFT)
|
90 |
|