A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the Open LLM Leaderboard suite of benchmarks. All while only being trained on 8 billion tokens of text from SlimPajama.

You have to pip install einops before using this model!

image/png

avg arc hellaswag mmlu truthfulqa
30.76 22.18 29.75 26.24 44.88
Downloads last month
12
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train crumb/cramped-94m-8btok

Collection including crumb/cramped-94m-8btok