TinyBuddy-80K

🏆 RECORD ATTEMPT: The smallest functional English-speaking language model on Hugging Face. 83,856 parameters — that's ~84K, beating the NaA-IA/Small-ever record by being both tiny AND coherent.

Mission: Prove that under 100K parameters, a language model can still learn English patterns and generate recognizable text. This is not just the smallest — it's the smallest that works.

Model Details

Property	Value
Parameters	83,856 (~84K)
Layers	1
Hidden size	48
Attention heads	4 (query) / 2 (key-value) = GQA
FF intermediate size	192
Context length	128
Vocabulary	1,024 tokens (BPE)
Architecture	Llama-style: RMSNorm, RoPE, SiLU/SwiGLU, tied embeddings
Precision	float32

Parameter Breakdown

Component	Parameters
Token Embedding (tied)	49,152
Attention (Q/K/V/O)	5,760
FeedForward (Gate/Up/Down)	27,648
LayerNorm (3× RMSNorm)	144
Total	83,856

Architecture

TinyBuddy-100K uses a single transformer block with:

RMSNorm (pre-norm) — efficient normalization
Grouped Query Attention — 4 query heads, 2 KV heads (saves params)
RoPE (Rotary Position Embeddings) — relative position encoding
SwiGLU (SiLU-gated MLP) — modern activation
Tied embeddings — input and output share weights (saves ~49K params!)

Input → Embedding → [RMSNorm → GQA Attention → +] → [RMSNorm → SwiGLU FFN → +] → RMSNorm → LM Head → Output

Training

Dataset: TinyStories (~5,000 stories)
Tokenizer: Byte-level BPE, 1,024 vocabulary (trained from scratch)
Optimizer: AdamW (lr=5e-3, weight_decay=0.1)
Schedule: Warmup (50 steps) + Cosine decay
Steps: 1,000 on CPU
Hardware: Single CPU core (the challenge!)

Usage

import torch
from model import create_model

# Load config
import json
with open("config.json") as f:
    config = json.load(f)

# Create model
model = create_model(config)
model.load_state_dict(torch.load("output/model.pt", map_location="cpu"))
model.eval()

# Generate
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("data/tokenizer.json")

prompt = "Once upon a time,"
encoded = tokenizer.encode(prompt)
ids = [1] + encoded.ids  # Add BOS
input_ids = torch.tensor([ids], dtype=torch.long)

output_ids = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=40)
print(tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True))

Limitations

This model is extremely small — it has fewer parameters than a 28×28 grayscale image.

What works:

Basic word patterns and short phrases
Recognizable English-like structure
Story-like opening sentences

What's broken:

Very limited coherence (1–2 sentences max)
High repetition
No factual knowledge or reasoning
Limited vocabulary diversity

This model exists purely to explore the lower bounds of language modeling. It proves that even at 84K parameters, a neural network can capture statistical patterns in English text.

The Record

Model	Parameters	Speaks English?
NaA-IA/Small-ever	112	❌ No
TinyBuddy-80K	83,856	✅ YES

TinyBuddy-100K may not be the absolute smallest model ever, but it's the smallest that actually generates recognizable English text. That's the real achievement.

Citation

@misc{tinybuddy100k,
  title  = {TinyBuddy-100K: An 84K parameter Llama-style model that speaks English},
  year   = {2026},
  note   = {Record attempt: smallest functional English text generator.}
}

LONG LIVE TINYBUDDY-80K 🚀

Downloads last month: -

Safetensors

Model size

151k params

Tensor type

F32

BOOL