πŸ“„ TinyDoc-VLM-256M

The World's Smallest Document-Specialist VLM β€” by eulogik

256M parameters | <1GB VRAM | >100 tok/s on CPU | Runs on Raspberry Pi

GitHub PyPI License

Quick Usage

pip install tinydoc
from PIL import Image
from tinydoc import TinyDocExtractor

extractor = TinyDocExtractor(model_name_or_id="eulogik/TinyDoc-VLM-256M")

# Question answering
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer)

# Structured JSON extraction
result = extractor.extract(img, output_format="json")
print(result.fields)

# Table extraction
result = extractor.extract_table(img)
print(result.markdown)

Try It

Open in HF Space

Architecture

  • Vision Encoder: SigLIP-B/16 (93M params)
  • Token Connector: Pixel-Shuffle compression (9Γ— at scale=3) β€” 576 β†’ 64 tokens
  • Decoder: SmolLM2-135M (30 LLaMA layers, GQA 9:3, 8192 context)
  • Output Heads: Multi-task (JSON, KV, Table, OCR, QA)
  • Total: ~290M params

Training

3-stage curriculum on 10K+ synthetic document types:

  1. Layout pretraining
  2. Document understanding
  3. Instruction tuning

See the training notebook to train your own.

Citation

@software{eulogik_tinydoc_vlm_2025,
  author = {eulogik},
  title = {TinyDoc-VLM: The World's Smallest Document-Specialist VLM},
  year = {2025},
  url = {https://github.com/eulogik/TinyDoc-VLM}
}

Built by eulogik β€” AI infrastructure for document intelligence.

Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using eulogik/TinyDoc-VLM-256M 1