Edit model card

Mixsmol-4x400M-v0.1 by Ontocord

This is the third checkpoint (Epoch 3) of Mixsmol-4x400M-v0.1 Note that this is an experimental in data mixing. Therefore, we only trained the model on 50B tokens (95% English and 5% Vietnamese) to test the following:

  • Reasoining capabilities through high-quality synthetic textbooks data pretraining
  • Crosslingual understanding through machine translation and multilingual + multiple tasks pretraining

After verifying our hypothesis with this run, we will schedule a second run on bigger data and compute for it to achieve its maximum capability.

Data

  • Synthetic Textbooks: 8M samples
  • RefinedWeb: 1M samples
  • RedPajama-v2: 500K samples
  • MathPile: Everything
  • ThePile: MiniPile Subset
  • GoodWiki
  • The Stack Smol XL
  • The Vault: train_small split
  • Instruction Pretraining: 250k samples
Downloads last month
24
Safetensors
Model size
1.77B params
Tensor type
BF16
·

Collection including vilm/Mixsmol-4x400M-v0.1-epoch3