TinyTalk — human-like small talk on a microcontroller

TinyTalk is a ~8.3M-parameter (≈1.6M non-embedding) GPT-Neo chatbot built to do one thing: hold short, friendly, human-sounding small-talk conversations on low-end hardware that can't run a normal LLM — think an ESP32-S3.

It is the model embedded in the Cardputer AI firmware, where it runs fully offline on the device in ~2 MB of flash after Q4_0 quantization. This repository hosts the full-precision PyTorch / safetensors weights so the model can be used, fine-tuned, or re-quantized on its own.

What it's for

Most chat models assume a datacenter GPU. TinyTalk asks the opposite question: how small can a model be and still feel like talking to someone? It trades away knowledge, reasoning, and long context to fit on a microcontroller, keeping only the ability to make warm, coherent small talk:

User: hey, how are you?
Bot: I am good! I played outside today. It was so much fun!
User: nice! what did you play?
Bot: I played with my ball. Do you want to play too?

Good fits: an offline conversational toy or companion on an ESP32 / handheld; a teaching example of an end-to-end on-device LLM; a tiny base to fine-tune for embedded chat. Not a fit: anything needing facts, instructions, reasoning, or safety guarantees.

What it is, technically

  • Architecture: GPT-Neo (GPTNeoForCausalLM) — 8 layers, hidden size 128, 16 heads, alternating global/local attention (window 256), learned position embeddings, tied input/output embeddings, GPT-2 byte-level BPE tokenizer (vocab 50257).
  • Base: roneneldan/TinyStories-Instruct-3M.
  • Fine-tune: ~70K filtered, simple-English dialogues from allenai/SODA, reformatted as User:/Bot: turns, mixed with a slice of TinyStoriesInstruct. Loss is masked to the bot replies / story bodies, so the model never trains on producing the user's turns.

Prompt format

Trained on this exact format, with <|endoftext|> (token 50256) between exchanges:

User: <message>
Bot: <reply><|endoftext|>
User: <message>
Bot:

Feed User: <message>\nBot: and generate until <|endoftext|>.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("TheREZOR/TinyTalk")
model = AutoModelForCausalLM.from_pretrained("TheREZOR/TinyTalk")

prompt = "User: hi, what's your name?\nBot:"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(
    ids, max_new_tokens=40, do_sample=True, temperature=0.7, top_k=40,
    eos_token_id=tok.eos_token_id,
)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))

Honest limitations

  • Kindergarten English only. Short, simple sentences.
  • No world knowledge. Factual questions get friendly confabulation.
  • Short memory. Trained/served with a tiny context (~80 tokens on device, 256 max). Not instruction-following, not safe for any production use.
  • A toy/educational model — interesting because it fits on a microcontroller, not because it is good.

License & attribution

Released under CC BY 4.0, the binding term inherited from the SODA training data. You must retain the following attributions:

  • Base model: TinyStories-Instruct-3M — Ronen Eldan & Yuanzhi Li, TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (arXiv:2305.07759). Published without an explicit license tag; the TinyStories dataset family is CDLA-Sharing-1.0, which places no restriction on trained models.
  • Fine-tune data: SODA (CC BY 4.0) — Kim et al., SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization (arXiv:2212.10465); and TinyStoriesInstruct (CDLA-Sharing-1.0).
  • Tokenizer: GPT-2 byte-level BPE — OpenAI GPT-2 (MIT).

See NOTICE.md for the full provenance.

Downloads last month
30
Safetensors
Model size
8.28M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheREZOR/TinyTalk

Finetuned
(1)
this model

Datasets used to train TheREZOR/TinyTalk

Papers for TheREZOR/TinyTalk