Image-Text-to-Text
MLX
Safetensors
PaddleOCR
paddleocr_vl
ERNIE4.5
PaddlePaddle
image-to-text
ocr
document-parse
layout
table
formula
chart
seal
spotting
apple-silicon
quantized
conversational
custom_code
8-bit precision
Instructions to use olragon/PaddleOCR-VL-1.6-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use olragon/PaddleOCR-VL-1.6-8bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("olragon/PaddleOCR-VL-1.6-8bit") config = load_config("olragon/PaddleOCR-VL-1.6-8bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - PaddleOCR
How to use olragon/PaddleOCR-VL-1.6-8bit with PaddleOCR:
# See https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html to installation from paddleocr import PaddleOCRVL pipeline = PaddleOCRVL(pipeline_version="olragon/PaddleOCR-VL-1.6-8bit") output = pipeline.predict("path/to/document_image.png") for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
PaddleOCR-VL-1.6 — MLX 8-bit
MLX-quantized (8-bit, group_size=64) version of PaddlePaddle/PaddleOCR-VL-1.6 for Apple Silicon inference via mlx-vlm.
Model Details
- Base model: PaddleOCR-VL-1.6
- OmniDocBench v1.6 score: 96.33 (#1 on the leaderboard as of June 2026)
- Architecture: PaddleOCRVLForConditionalGeneration (18 LLM layers, 27 vision layers)
- Quantization: 8-bit affine, group_size=64
- Size: ~1.0 GB (vs ~2 GB bf16)
- Converted with: mlx-vlm >= 0.3.11
Usage
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model_id = "olragon/PaddleOCR-VL-1.6-8bit"
model, processor = load(model_id)
config = load_config(model_id)
prompt = apply_chat_template(
processor, config,
"OCR the text in this image.",
num_images=1
)
result = generate(
model, processor, prompt,
image=["page.png"],
max_tokens=6000,
repetition_penalty=1.1,
verbose=False,
)
text = result.text if hasattr(result, "text") else str(result)
print(text)
Or via CLI:
uv run --python 3.12 --with "mlx-vlm>=0.3.11" --with pillow \
python3 -m mlx_vlm.generate \
--model olragon/PaddleOCR-VL-1.6-8bit \
--image page.png \
--prompt "OCR the text in this image." \
--max-tokens 6000
Conversion
Converted using:
python3 -m mlx_vlm.convert \
--hf-path PaddlePaddle/PaddleOCR-VL-1.6 \
--mlx-path ./PaddleOCR-VL-1.6-8bit \
--quantize --q-bits 8
Benchmarks
PaddleOCR-VL 1.6 improvements over 1.5:
- OmniDocBench: 94.50 → 96.33 (+1.83)
- Better polygon localization (quadrilateral → polygon shapes)
- Seal/stamp recognition
- Cross-page table merging
License
Apache 2.0 (same as the base model)
- Downloads last month
- 244
Model size
0.4B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit
Model tree for olragon/PaddleOCR-VL-1.6-8bit
Base model
baidu/ERNIE-4.5-0.3B-Paddle Finetuned
PaddlePaddle/PaddleOCR-VL-1.5 Finetuned
PaddlePaddle/PaddleOCR-VL-1.6