blockblockblock
/

smol_llama-220M-GQA-bpw2.5

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

smol_llama: 220M GQA

model card WIP, more details to come

A small 220M param (total) decoder model. This is the first version of the model.

1024 hidden size, 10 layers
GQA (32 heads, 8 key-value), context length 2048
train-from-scratch on one GPU :)

Links

Here are some fine-tunes we did, but there are many more possibilities out there!

instruct
- openhermes - link
- open-instruct - link
code
- python (pypi) - link
zephyr DPO tune
- SFT - link
- full DPO - link

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	29.44
AI2 Reasoning Challenge (25-Shot)	24.83
HellaSwag (10-Shot)	29.76
MMLU (5-Shot)	25.85
TruthfulQA (0-shot)	44.55
Winogrande (5-shot)	50.99
GSM8k (5-shot)	0.68

Downloads last month: 9

Datasets used to train blockblockblock/smol_llama-220M-GQA-bpw2.5

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

24.830
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

29.760
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

25.850
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

44.550
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

50.990
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

0.680

View on Papers With Code