Instructions to use sabafallah/Unlimited-OCR-Universal with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sabafallah/Unlimited-OCR-Universal with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="sabafallah/Unlimited-OCR-Universal", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("sabafallah/Unlimited-OCR-Universal", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sabafallah/Unlimited-OCR-Universal with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sabafallah/Unlimited-OCR-Universal" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sabafallah/Unlimited-OCR-Universal", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/sabafallah/Unlimited-OCR-Universal
- SGLang
How to use sabafallah/Unlimited-OCR-Universal with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sabafallah/Unlimited-OCR-Universal" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sabafallah/Unlimited-OCR-Universal", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sabafallah/Unlimited-OCR-Universal" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sabafallah/Unlimited-OCR-Universal", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use sabafallah/Unlimited-OCR-Universal with Docker Model Runner:
docker model run hf.co/sabafallah/Unlimited-OCR-Universal
Unlimited OCR Works
Welcome the Era of One-shot Long-horizon Parsing.
Multi-device fork. This is a community fork of baidu/Unlimited-OCR (MIT) that also runs on Apple Silicon (MPS) and CPU, not just CUDA. The model weights are unchanged — only the inference code was made device-agnostic, plus a fix for a
masked_scatter_bug on the MPS backend and automatic fp32 on MPS (bf16 degrades there). See Transformers on Apple Silicon (MPS) for setup, or just runpython demo/run_ocr.py image. Full credit for the model and research goes to the original authors.
Release
- [2026/06/22] 🚀 We present Unlimited-OCR, aiming to push Deepseek-OCR one step further.
- [2026/06/23] 🤝 Thanks to the ModelScope community for their support. Our model is now available at ModelScope.
Inference
Transformers
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.3 + CUDA12.9:
torch==2.10.0
torchvision==0.25.0
transformers==4.57.1
Pillow==12.1.1
matplotlib==3.10.8
einops==0.8.2
addict==2.4.0
easydict==1.13
pymupdf==1.27.2.2
psutil==7.2.2
import os
import torch
from transformers import AutoModel, AutoTokenizer
model_name = 'baidu/Unlimited-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
use_safetensors=True,
torch_dtype=torch.bfloat16,
)
model = model.eval().cuda()
# ── Single image supports two configs: gundam or base ──
# gundam: base_size=1024, image_size=640, crop_mode=True
# base: base_size=1024, image_size=1024, crop_mode=False
model.infer(
tokenizer,
prompt='<image>document parsing.',
image_file='your_image.jpg',
output_path='your/output/dir',
base_size=1024, image_size=640, crop_mode=True,
max_length=32768,
no_repeat_ngram_size=35, ngram_window=128,
save_results=True,
)
# ── Multi page / PDF only uses base (image_size=1024) ──
model.infer_multi(
tokenizer,
prompt='<image>Multi page parsing.',
image_files=['page1.png', 'page2.png', 'page3.png'],
output_path='your/output/dir',
image_size=1024,
max_length=32768,
no_repeat_ngram_size=35, ngram_window=1024,
save_results=True,
)
# ── PDF (convert pages to images, then multi-page parsing) ──
import tempfile, fitz # PyMuPDF
def pdf_to_images(pdf_path, dpi=300):
doc = fitz.open(pdf_path)
tmp_dir = tempfile.mkdtemp(prefix='pdf_ocr_')
mat = fitz.Matrix(dpi / 72, dpi / 72)
paths = []
for i, page in enumerate(doc):
out = os.path.join(tmp_dir, f'page_{i+1:04d}.png')
page.get_pixmap(matrix=mat).save(out)
paths.append(out)
doc.close()
return paths
model.infer_multi(
tokenizer,
prompt='<image>Multi page parsing.',
image_files=pdf_to_images('your_doc.pdf', dpi=300),
output_path='your/output/dir',
image_size=1024,
max_length=32768,
no_repeat_ngram_size=35, ngram_window=1024,
save_results=True,
)
Transformers on Apple Silicon (MPS)
The CUDA snippet above does not run as-is on a Mac — it hardcodes .cuda() and
torch.autocast("cuda", ...). The model code in this repo has been patched to be
device-agnostic, and this directory ships a pyproject.toml plus a ready-to-run
demo (demo/run_ocr.py). The SGLang path below
(--attention-backend fa3, FlashAttention-3) is CUDA-only and is not available
on MPS; use this Transformers path instead.
Setup (uv-managed virtualenv — same pins as the CUDA list; the macOS arm64 torch/torchvision wheels ship with MPS):
uv venv --python 3.12
source .venv/bin/activate
uv pip install -r pyproject.toml
Run (the weights in this directory are loaded directly — no Hub re-download):
python demo/run_ocr.py image # bundled sample image
python demo/run_ocr.py image your_image.jpg --mode gundam # or --mode base
python demo/run_ocr.py multi page1.png page2.png
python demo/run_ocr.py pdf your_doc.pdf --dpi 300
Or load it straight from the Hub with from_pretrained (downloads the weights and the
patched trust_remote_code files — no clone needed):
import os
os.environ.setdefault("PYTORCH_ENABLE_MPS_FALLBACK", "1") # CPU fallback for ops without an MPS kernel
import torch
from transformers import AutoModel, AutoTokenizer
model_id = "sabafallah/Unlimited-OCR-Universal"
if torch.cuda.is_available():
device, dtype = "cuda", torch.bfloat16 # bf16 is correct (and fast) on CUDA
elif torch.backends.mps.is_available():
device, dtype = "mps", torch.float32 # use fp32 on MPS -- bf16 degrades the output there
else:
device, dtype = "cpu", torch.float32
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
use_safetensors=True,
torch_dtype=dtype,
attn_implementation="eager", # only "mha_eager" is registered for this config
)
model = model.eval().to(device)
# Single image — gundam (crops; best for dense pages). For base: image_size=1024, crop_mode=False.
model.infer(
tokenizer,
prompt="<image>Free OCR.", # or "<image>document parsing." for layout + boxes
image_file="your_image.jpg",
output_path="output",
base_size=1024, image_size=640, crop_mode=True,
max_length=32768,
no_repeat_ngram_size=35, ngram_window=128,
save_results=True,
)
Multi-page / PDF use model.infer_multi(...) exactly as in the CUDA example above
(just keep the same dtype/device selection and attn_implementation="eager").
Three things differ from CUDA and are handled for you by the scripts / patched code:
- Run in float32 on MPS, not bfloat16. On MPS, bf16 rounding is amplified by the
MoE router (its top-6-of-64 expert selection flips under bf16), so the decode drifts
into repeated garbage after a few tokens. fp32 reproduces the CUDA results exactly
(~13 GB RAM). The scripts handle this automatically — they select
torch.float32on MPS/CPU and keeptorch.bfloat16on CUDA (the Hub snippet above does the same). attn_implementation="eager". Withuse_mla=false, the DeepseekV2 decoder only registers amha_eagerattention class, so any other backend raises aKeyError.masked_scatter_is broken on MPS (torch 2.10 silently writes garbage), which left the image-token embeddings unfilled and made the model emit<eos>immediately. The repo code now injects image features with an index-assignment equivalent on MPS, while CUDA/CPU keep the originalmasked_scatter_.PYTORCH_ENABLE_MPS_FALLBACK=1(set by the scripts) lets any op without an MPS kernel fall back to CPU instead of crashing.
SGLang
SGLang (
--attention-backend fa3, FlashAttention-3) is CUDA-only and does not run on MPS. This fork does not bundle the SGLang wheel (to keep the repo slim); grabsglang-0.0.0.dev11416+g92e8bb79e-py3-none-any.whlfrom the original baidu/Unlimited-OCR repo.
Set up the environment (uv-managed virtualenv). Install the SGLang wheel first,
then pin kernels==0.9.0 and install PyMuPDF for PDF-to-image conversion:
uv venv --python 3.12
source .venv/bin/activate
uv pip install sglang-0.0.0.dev11416+g92e8bb79e-py3-none-any.whl
uv pip install kernels==0.11.7
uv pip install pymupdf==1.27.2.2
Start the SGLang server:
python -m sglang.launch_server \
--model baidu/Unlimited-OCR \
--served-model-name Unlimited-OCR \
--attention-backend fa3 \
--page-size 1 \
--mem-fraction-static 0.8 \
--context-length 32768 \
--enable-custom-logit-processor \
--disable-overlap-schedule \
--skip-server-warmup \
--host 0.0.0.0 \
--port 10000
Send streaming requests to the OpenAI-compatible API:
import base64
import json
import os
import tempfile
import fitz
import requests
from sglang.srt.sampling.custom_logit_processor import DeepseekOCRNoRepeatNGramLogitProcessor
server_url = "http://127.0.0.1:10000"
session = requests.Session()
session.trust_env = False
def pdf_to_images(pdf_path, dpi=300):
doc = fitz.open(pdf_path)
tmp_dir = tempfile.mkdtemp(prefix="pdf_ocr_")
mat = fitz.Matrix(dpi / 72, dpi / 72)
image_paths = []
for i, page in enumerate(doc):
image_path = os.path.join(tmp_dir, f"page_{i + 1:04d}.png")
page.get_pixmap(matrix=mat).save(image_path)
image_paths.append(image_path)
doc.close()
return image_paths
def encode_image(image_path):
ext = os.path.splitext(image_path)[1].lower()
mime = "image/jpeg" if ext in (".jpg", ".jpeg") else f"image/{ext.lstrip('.')}"
with open(image_path, "rb") as f:
data = base64.b64encode(f.read()).decode("utf-8")
return {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{data}"}}
def build_content(prompt, image_paths):
return [{"type": "text", "text": prompt}] + [encode_image(path) for path in image_paths]
def generate(prompt, image_paths, image_mode, ngram_window):
payload = {
"model": "Unlimited-OCR",
"messages": [{"role": "user", "content": build_content(prompt, image_paths)}],
"temperature": 0,
"skip_special_tokens": False,
"images_config": {"image_mode": image_mode},
"custom_logit_processor": DeepseekOCRNoRepeatNGramLogitProcessor.to_str(),
"custom_params": {
"ngram_size": 35,
"window_size": ngram_window,
},
"stream": True,
}
response = session.post(
f"{server_url}/v1/chat/completions",
headers={"Content-Type": "application/json"},
data=json.dumps(payload),
timeout=1200,
stream=True,
)
response.raise_for_status()
chunks = []
for line in response.iter_lines(chunk_size=1, decode_unicode=True):
if not line or not line.startswith("data: "):
continue
data = line[len("data: "):]
if data == "[DONE]":
break
event = json.loads(data)
delta = event["choices"][0].get("delta", {}).get("content", "")
if delta:
print(delta, end="", flush=True)
chunks.append(delta)
print()
return "".join(chunks)
# Single image supports two configs: gundam or base. Example below uses gundam.
generate("document parsing.", ["your_image.jpg"], image_mode="gundam", ngram_window=128)
# Multi image (base only)
generate("Multi page parsing.", ["page1.png", "page2.png"], image_mode="base", ngram_window=1024)
# PDF (base only)
generate("Multi page parsing.", pdf_to_images("your_doc.pdf", dpi=300), image_mode="base", ngram_window=1024)
Visualization
Acknowledgement
We would like to thank Deepseek-OCR, Deepseek-OCR-2, PaddleOCR for their valuable models and ideas.
Citation
Coming soon!
- Downloads last month
- 423
Model tree for sabafallah/Unlimited-OCR-Universal
Base model
baidu/Unlimited-OCR