📄 TinyDoc-VLM-256M

The World's Smallest Document-Specialist VLM — by eulogik

256M parameters | <1GB VRAM | >100 tok/s on CPU | Runs on Raspberry Pi

Quick Usage

pip install tinydoc

from PIL import Image
from tinydoc import TinyDocExtractor

extractor = TinyDocExtractor(model_name_or_id="eulogik/TinyDoc-VLM-256M")

# Question answering
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer)

# Structured JSON extraction
result = extractor.extract(img, output_format="json")
print(result.fields)

# Table extraction
result = extractor.extract_table(img)
print(result.markdown)

Try It

Architecture

Vision Encoder: SigLIP-B/16 (93M params)
Token Connector: Pixel-Shuffle compression (9× at scale=3) — 576 → 64 tokens
Decoder: SmolLM2-135M (30 LLaMA layers, GQA 9:3, 8192 context)
Output Heads: Multi-task (JSON, KV, Table, OCR, QA)
Total: ~290M params

Training

3-stage curriculum on 10K+ synthetic document types:

Layout pretraining
Document understanding
Instruction tuning

See the training notebook to train your own.

Citation

@software{eulogik_tinydoc_vlm_2025,
  author = {eulogik},
  title = {TinyDoc-VLM: The World's Smallest Document-Specialist VLM},
  year = {2025},
  url = {https://github.com/eulogik/TinyDoc-VLM}
}

Built by eulogik — AI infrastructure for document intelligence.

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32