PocketGPT
PocketGPT is a 320M parameter autoregressive language model trained from scratch on approximately 5 billion tokens of ClimbMix.
Model Details
| Property | Value |
|---|---|
| Parameters | 320M |
| Training Tokens | ~5B |
| Context Length | 1024 |
| Vocabulary Size | 49,152 |
| Architecture | Decoder-only Transformer |
| Positional Encoding | RoPE |
| Normalization | RMSNorm |
| Attention | Grouped Query Attention (16 Q Heads, 4 KV Heads) |
Benchmark Results
| Benchmark | GPT-2 Medium (355M) | Pythia 410M | PocketGPT (320M) |
|---|---|---|---|
| HellaSwag | 37.52 | 40.9 | 37.95 |
| WinoGrande | 52.5 | 53.7 | 53.35 |
| ARC Easy | 44.7 | 52.1 | 52.86 |
| ARC Challenge | 20.1 | 21.3 | 31.40 |
| OpenBookQA | 19.8 | — | 32.80 |
| MMLU | ~25 | 27.3 | 28.24 |
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support