Edit model card

smol_llama: 220M GQA

model card WIP, more details to come

A small 220M param (total) decoder model. This is the first version of the model.

  • 1024 hidden size, 10 layers
  • GQA (32 heads, 8 key-value), context length 2048
  • train-from-scratch on one GPU :)

Links

Here are some fine-tunes we did, but there are many more possibilities out there!

  • instruct
    • openhermes - link
    • open-instruct - link
  • code
    • python (pypi) - link
  • zephyr DPO tune

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 29.44
AI2 Reasoning Challenge (25-Shot) 24.83
HellaSwag (10-Shot) 29.76
MMLU (5-Shot) 25.85
TruthfulQA (0-shot) 44.55
Winogrande (5-shot) 50.99
GSM8k (5-shot) 0.68
Downloads last month
5,710
Safetensors
Model size
218M params
Tensor type
BF16
·

Datasets used to train BEE-spoke-data/smol_llama-220M-GQA

Collection including BEE-spoke-data/smol_llama-220M-GQA

Evaluation results