Text Generation
Transformers
English
llama
smol_llama
llama2
Eval Results
Inference Endpoints
text-generation-inference
Edit model card

smol_llama: 220M GQA

model card WIP, more details to come

A small 220M param (total) decoder model. This is the first version of the model.

  • 1024 hidden size, 10 layers
  • GQA (32 heads, 8 key-value), context length 2048
  • train-from-scratch on one GPU :)

Links

Here are some fine-tunes we did, but there are many more possibilities out there!

  • instruct
    • openhermes - link
    • open-instruct - link
  • code
    • python (pypi) - link
  • zephyr DPO tune

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 29.44
AI2 Reasoning Challenge (25-Shot) 24.83
HellaSwag (10-Shot) 29.76
MMLU (5-Shot) 25.85
TruthfulQA (0-shot) 44.55
Winogrande (5-shot) 50.99
GSM8k (5-shot) 0.68
Downloads last month
9

Datasets used to train blockblockblock/smol_llama-220M-GQA-bpw2.5

Evaluation results