Cramp(ed) Models
Collection
Smaller models trained locally on my 2xA6000 Lambda Vector
•
3 items
•
Updated
•
1
A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the Open LLM Leaderboard suite of benchmarks. All while only being trained on 8 billion tokens of text from SlimPajama.
You have to pip install einops
before using this model!
avg | arc | hellaswag | mmlu | truthfulqa |
---|---|---|---|---|
30.76 | 22.18 | 29.75 | 26.24 | 44.88 |