SeaLLMs
/

SeaLLM-13B-Chat

Model card Files Files and versions Community

nxphi47 commited on Oct 29, 2023

Commit

0584721

•

1 Parent(s): 9fd721e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -84,7 +84,7 @@ We conduct pre-training in 4 different stages. Each stage serves a different spe
 As our goal is for Llama-2 to learn new languages with the least number of tokens and computing resources, we control an appropriate data mix of new (Vi, Id & Th) and old (En, Zh) languages so that the new vocabulary and knowledge are trained quickly, while relatively maintaining the performance of the original Llama-2 model and establishing a knowledge bridge between new and existing languages.
-We pre-train our SeaLLM-base in ~4 weeks on 32gpus, clocking ~150B tokens.
 ## Supervised Finetuning (SFT)

 As our goal is for Llama-2 to learn new languages with the least number of tokens and computing resources, we control an appropriate data mix of new (Vi, Id & Th) and old (En, Zh) languages so that the new vocabulary and knowledge are trained quickly, while relatively maintaining the performance of the original Llama-2 model and establishing a knowledge bridge between new and existing languages.
+We pre-train our SeaLLM-base in ~4 weeks on 32gpus, clocking ~150B tokens. We use [Flash-attention-V2](https://github.com/Dao-AILab/flash-attention) as well as fusing many operations to achieve greater training throughput.
 ## Supervised Finetuning (SFT)