Instructions to use Eeppa/TinyBuddy-80K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Eeppa/TinyBuddy-80K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eeppa/TinyBuddy-80K", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Eeppa/TinyBuddy-80K", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Eeppa/TinyBuddy-80K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eeppa/TinyBuddy-80K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-80K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Eeppa/TinyBuddy-80K
- SGLang
How to use Eeppa/TinyBuddy-80K with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-80K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-80K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-80K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-80K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Eeppa/TinyBuddy-80K with Docker Model Runner:
docker model run hf.co/Eeppa/TinyBuddy-80K
TinyBuddy-80K
π RECORD ATTEMPT: The smallest functional English-speaking language model on Hugging Face. 83,856 parameters β that's ~84K, beating the NaA-IA/Small-ever record by being both tiny AND coherent.
Mission: Prove that under 100K parameters, a language model can still learn English patterns and generate recognizable text. This is not just the smallest β it's the smallest that works.
Model Details
| Property | Value |
|---|---|
| Parameters | 83,856 (~84K) |
| Layers | 1 |
| Hidden size | 48 |
| Attention heads | 4 (query) / 2 (key-value) = GQA |
| FF intermediate size | 192 |
| Context length | 128 |
| Vocabulary | 1,024 tokens (BPE) |
| Architecture | Llama-style: RMSNorm, RoPE, SiLU/SwiGLU, tied embeddings |
| Precision | float32 |
Parameter Breakdown
| Component | Parameters |
|---|---|
| Token Embedding (tied) | 49,152 |
| Attention (Q/K/V/O) | 5,760 |
| FeedForward (Gate/Up/Down) | 27,648 |
| LayerNorm (3Γ RMSNorm) | 144 |
| Total | 83,856 |
Architecture
TinyBuddy-100K uses a single transformer block with:
- RMSNorm (pre-norm) β efficient normalization
- Grouped Query Attention β 4 query heads, 2 KV heads (saves params)
- RoPE (Rotary Position Embeddings) β relative position encoding
- SwiGLU (SiLU-gated MLP) β modern activation
- Tied embeddings β input and output share weights (saves ~49K params!)
Input β Embedding β [RMSNorm β GQA Attention β +] β [RMSNorm β SwiGLU FFN β +] β RMSNorm β LM Head β Output
Training
- Dataset: TinyStories (~5,000 stories)
- Tokenizer: Byte-level BPE, 1,024 vocabulary (trained from scratch)
- Optimizer: AdamW (lr=5e-3, weight_decay=0.1)
- Schedule: Warmup (50 steps) + Cosine decay
- Steps: 1,000 on CPU
- Hardware: Single CPU core (the challenge!)
Usage
import torch
from model import create_model
# Load config
import json
with open("config.json") as f:
config = json.load(f)
# Create model
model = create_model(config)
model.load_state_dict(torch.load("output/model.pt", map_location="cpu"))
model.eval()
# Generate
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("data/tokenizer.json")
prompt = "Once upon a time,"
encoded = tokenizer.encode(prompt)
ids = [1] + encoded.ids # Add BOS
input_ids = torch.tensor([ids], dtype=torch.long)
output_ids = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=40)
print(tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True))
Limitations
This model is extremely small β it has fewer parameters than a 28Γ28 grayscale image.
What works:
- Basic word patterns and short phrases
- Recognizable English-like structure
- Story-like opening sentences
What's broken:
- Very limited coherence (1β2 sentences max)
- High repetition
- No factual knowledge or reasoning
- Limited vocabulary diversity
This model exists purely to explore the lower bounds of language modeling. It proves that even at 84K parameters, a neural network can capture statistical patterns in English text.
The Record
| Model | Parameters | Speaks English? |
|---|---|---|
| NaA-IA/Small-ever | 112 | β No |
| TinyBuddy-80K | 83,856 | β YES |
TinyBuddy-100K may not be the absolute smallest model ever, but it's the smallest that actually generates recognizable English text. That's the real achievement.
Citation
@misc{tinybuddy100k,
title = {TinyBuddy-100K: An 84K parameter Llama-style model that speaks English},
year = {2026},
note = {Record attempt: smallest functional English text generator.}
}
LONG LIVE TINYBUDDY-80K π
- Downloads last month
- -