Edit model card

A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the Open LLM Leaderboard suite of benchmarks. All while only being trained on 8 billion tokens of text from SlimPajama.

You have to pip install einops before using this model!

image/png

avg arc hellaswag mmlu truthfulqa
30.76 22.18 29.75 26.24 44.88
Downloads last month
2
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train crumb/cramped-94m-8btok

Collection including crumb/cramped-94m-8btok