Training

This model was trained on two datasets, shown in this model page.

Skylion007/openwebtext: 1,000,000 examples at a batch size of 32-4096 (1 epoch)
Locutusque/TM-DATA: All examples at a batch size of 12288 (3 epochs) Training took approximately 500 GPU hours on a single Titan V.

Metrics

You can look at the training metrics here: https://wandb.ai/locutusque/TinyMistral-V2/runs/g0rvw6wc

🔥 This model performed excellently on TruthfulQA, outperforming models more than 720x its size. These models include: mistralai/Mixtral-8x7B-v0.1, tiiuae/falcon-180B, berkeley-nest/Starling-LM-7B-alpha, upstage/SOLAR-10.7B-v1.0, and more. 🔥

Locutusque
/

TinyMistral-248M-v2

Training

Metrics

Model tree for Locutusque/TinyMistral-248M-v2

Datasets used to train Locutusque/TinyMistral-248M-v2