Please train a 750M model on:

#3
by Tralalabs - opened

HuggingFaceTB should train a 750M model from scratch using these settings: Datasets: pretraining: HuggingFaceTB/smollm-corpus + FineWeb-Edu + FineMath + Stack-Edu, + Cosmopedia-v2 + FineWeb 2 (spa_Latn subset), after pretraining: SmolTalk2 + OpenThoughts-114k + Smol-Smoltalk. Context Window: 16K tokens (16,384 tokens), ideal for 750M models. Model Name: SmolLM4-750M. Instruction Format: Messages. Vocab Tokenizer Size: 49K tokens (49,152 tokens).
Please train a 750M model on these settings.

Sign up or log in to comment