π TinyDoc-VLM-256M
The World's Smallest Document-Specialist VLM β by eulogik
256M parameters | <1GB VRAM | >100 tok/s on CPU | Runs on Raspberry Pi
Quick Usage
pip install tinydoc
from PIL import Image
from tinydoc import TinyDocExtractor
extractor = TinyDocExtractor(model_name_or_id="eulogik/TinyDoc-VLM-256M")
# Question answering
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer)
# Structured JSON extraction
result = extractor.extract(img, output_format="json")
print(result.fields)
# Table extraction
result = extractor.extract_table(img)
print(result.markdown)
Try It
Architecture
- Vision Encoder: SigLIP-B/16 (93M params)
- Token Connector: Pixel-Shuffle compression (9Γ at scale=3) β 576 β 64 tokens
- Decoder: SmolLM2-135M (30 LLaMA layers, GQA 9:3, 8192 context)
- Output Heads: Multi-task (JSON, KV, Table, OCR, QA)
- Total: ~290M params
Training
3-stage curriculum on 10K+ synthetic document types:
- Layout pretraining
- Document understanding
- Instruction tuning
See the training notebook to train your own.
Citation
@software{eulogik_tinydoc_vlm_2025,
author = {eulogik},
title = {TinyDoc-VLM: The World's Smallest Document-Specialist VLM},
year = {2025},
url = {https://github.com/eulogik/TinyDoc-VLM}
}
Built by eulogik β AI infrastructure for document intelligence.
- Downloads last month
- -