Instructions to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Reza2kn/surya-ocr-2-mlx-8bit-g64") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Reza2kn/surya-ocr-2-mlx-8bit-g64") model = AutoModelForMultimodalLM.from_pretrained("Reza2kn/surya-ocr-2-mlx-8bit-g64") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("Reza2kn/surya-ocr-2-mlx-8bit-g64") config = load_config("Reza2kn/surya-ocr-2-mlx-8bit-g64") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Reza2kn/surya-ocr-2-mlx-8bit-g64" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Reza2kn/surya-ocr-2-mlx-8bit-g64", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Reza2kn/surya-ocr-2-mlx-8bit-g64
- SGLang
How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Reza2kn/surya-ocr-2-mlx-8bit-g64" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Reza2kn/surya-ocr-2-mlx-8bit-g64", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Reza2kn/surya-ocr-2-mlx-8bit-g64" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Reza2kn/surya-ocr-2-mlx-8bit-g64", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with Docker Model Runner:
docker model run hf.co/Reza2kn/surya-ocr-2-mlx-8bit-g64
Surya OCR 2 MLX 8-bit G64
This repository contains an **experimental quantized** artifact derived from [datalab-to/surya-ocr-2](https://huggingface.co/datalab-to/surya-ocr-2).
This 8-bit MLX quant is the most useful Apple-side artifact from the current batch. It keeps perfect mini-section scores on arxiv math, headers/footers, multi-column, old-scans-math, tables, and baseline checks, but it currently fails the old-scans mini split and is weak on long tiny text.
## What is included
- Source model: `datalab-to/surya-ocr-2`
- Runtime/format: MLX / mlx-vlm
- Quantization: 8-bit affine weight quantization, group size 64
- Vision weights included: Yes. The MLX checkpoint includes the model vision weights and processor assets.
- Processor/tokenizer assets: included
## Mini olmOCR-bench results
| Candidate | Overall | Arxiv math | Headers/footers | Long tiny text | Multi-column | Old scans | Old scans math | Tables | Baseline |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | Source mini baseline | 91.0% ± 6.3% | 100.0% | 100.0% | 100.0% | 100.0% | 33.3% | 100.0% | 100.0% | 94.7% | | Surya OCR 2 MLX 8-bit G64 | 79.2% ± 6.2% | 100.0% | 100.0% | 33.3% | 100.0% | 0.0% | 100.0% | 100.0% | 100.0% |
How to read the benchmark table
This is an early quant release with transparent limitations. The table uses our local 40-test mini slice of allenai/olmOCR-bench, with 3 samples from each named section plus the benchmark baseline checks. It is not the full public score and it is not a claim of >98% parity.
The useful signal is the split behavior: this artifact is currently strong on clean academic/math, headers/footers, multi-column layouts, tables, old-scan math, and baseline OCR checks, but it should not be used for old degraded scans and is weak on long tiny text.
Recommended use
Use this checkpoint for local experimentation and constrained OCR workloads whose documents resemble the passing sections above. Avoid using it as a production replacement for the original model on degraded historical scans, very small dense body text, or workloads requiring full benchmark parity.
## Loading
```python
from mlx_vlm import load, generate
model, processor = load("Reza2kn/surya-ocr-2-mlx-8bit-g64")
Pass images/documents through the same Surya/MLX-VLM prompting path used by your app.
## Limitations
- This is not a full-parity release yet.
- Do **not** use this artifact for degraded old scans; the current mini split score is 0.0% there.
- Do **not** use this artifact for long tiny text unless you independently validate your data; the current mini split score is 33.3%.
- Math-heavy and table/layout-heavy mini examples looked good in this slice, but full olmOCR-bench is still pending.
## Provenance
Generated non-destructively from the original Hugging Face checkpoint. This is not a fine-tune. The goal of publishing this artifact now is transparency: the files are usable for the passing workload slices above, and the known failing slices are documented clearly.
- Downloads last month
- 32
8-bit
Model tree for Reza2kn/surya-ocr-2-mlx-8bit-g64
Base model
datalab-to/surya-ocr-2