Edit model card

smol_llama-81M-tied

banner

A small 81M param (total) decoder model, enabled through tying the input/output embeddings. This is the first version of the model.

  • 768 hidden size, 6 layers
  • standard multi-head attention (24 heads), context length 1024
  • input/output embeddings are tied
  • train-from-scratch

Notes

This checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task. It should be fine-tuned before use in most cases.

  • slightly larger 101M param GQA pretrained version: here
  • For the chat version of this model, please see here

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 24.52
ARC (25-shot) 22.18
HellaSwag (10-shot) 29.33
MMLU (5-shot) 24.06
TruthfulQA (0-shot) 43.97
Winogrande (5-shot) 49.25
GSM8K (5-shot) 0.23
DROP (3-shot) 2.64
Downloads last month
932
Safetensors
Model size
81.3M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train BEE-spoke-data/smol_llama-81M-tied

Collection including BEE-spoke-data/smol_llama-81M-tied