PocketGPT

PocketGPT is a 320M parameter autoregressive language model trained from scratch on approximately 5 billion tokens of ClimbMix.

Model Details

Property Value
Parameters 320M
Training Tokens ~5B
Context Length 1024
Vocabulary Size 49,152
Architecture Decoder-only Transformer
Positional Encoding RoPE
Normalization RMSNorm
Attention Grouped Query Attention (16 Q Heads, 4 KV Heads)

Benchmark Results

Benchmark GPT-2 Medium (355M) Pythia 410M PocketGPT (320M)
HellaSwag 37.52 40.9 37.95
WinoGrande 52.5 53.7 53.35
ARC Easy 44.7 52.1 52.86
ARC Challenge 20.1 21.3 31.40
OpenBookQA 19.8 — 32.80
MMLU ~25 27.3 28.24
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support