Instructions to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Reza2kn/surya-ocr-2-mlx-8bit-g64")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Reza2kn/surya-ocr-2-mlx-8bit-g64")
model = AutoModelForMultimodalLM.from_pretrained("Reza2kn/surya-ocr-2-mlx-8bit-g64")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("Reza2kn/surya-ocr-2-mlx-8bit-g64")
config = load_config("Reza2kn/surya-ocr-2-mlx-8bit-g64")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

vLLM

How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Reza2kn/surya-ocr-2-mlx-8bit-g64"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Reza2kn/surya-ocr-2-mlx-8bit-g64",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Reza2kn/surya-ocr-2-mlx-8bit-g64

SGLang

How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Reza2kn/surya-ocr-2-mlx-8bit-g64" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Reza2kn/surya-ocr-2-mlx-8bit-g64",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Reza2kn/surya-ocr-2-mlx-8bit-g64" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Reza2kn/surya-ocr-2-mlx-8bit-g64",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Reza2kn/surya-ocr-2-mlx-8bit-g64 with Docker Model Runner:
```
docker model run hf.co/Reza2kn/surya-ocr-2-mlx-8bit-g64
```

Surya OCR 2 MLX 8-bit G64

This repository contains an **experimental quantized** artifact derived from [datalab-to/surya-ocr-2](https://huggingface.co/datalab-to/surya-ocr-2).

This 8-bit MLX quant is the most useful Apple-side artifact from the current batch. It keeps perfect mini-section scores on arxiv math, headers/footers, multi-column, old-scans-math, tables, and baseline checks, but it currently fails the old-scans mini split and is weak on long tiny text.

## What is included

- Source model: `datalab-to/surya-ocr-2`
- Runtime/format: MLX / mlx-vlm
- Quantization: 8-bit affine weight quantization, group size 64
- Vision weights included: Yes. The MLX checkpoint includes the model vision weights and processor assets.
- Processor/tokenizer assets: included

## Mini olmOCR-bench results

| Candidate | Overall | Arxiv math | Headers/footers | Long tiny text | Multi-column | Old scans | Old scans math | Tables | Baseline |

|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | Source mini baseline | 91.0% ± 6.3% | 100.0% | 100.0% | 100.0% | 100.0% | 33.3% | 100.0% | 100.0% | 94.7% | | Surya OCR 2 MLX 8-bit G64 | 79.2% ± 6.2% | 100.0% | 100.0% | 33.3% | 100.0% | 0.0% | 100.0% | 100.0% | 100.0% |

How to read the benchmark table

This is an early quant release with transparent limitations. The table uses our local 40-test mini slice of allenai/olmOCR-bench, with 3 samples from each named section plus the benchmark baseline checks. It is not the full public score and it is not a claim of >98% parity.

The useful signal is the split behavior: this artifact is currently strong on clean academic/math, headers/footers, multi-column layouts, tables, old-scan math, and baseline OCR checks, but it should not be used for old degraded scans and is weak on long tiny text.

Recommended use

Use this checkpoint for local experimentation and constrained OCR workloads whose documents resemble the passing sections above. Avoid using it as a production replacement for the original model on degraded historical scans, very small dense body text, or workloads requiring full benchmark parity.

## Loading

```python

from mlx_vlm import load, generate

model, processor = load("Reza2kn/surya-ocr-2-mlx-8bit-g64")

Pass images/documents through the same Surya/MLX-VLM prompting path used by your app.


    ## Limitations

    - This is not a full-parity release yet.
    - Do **not** use this artifact for degraded old scans; the current mini split score is 0.0% there.
    - Do **not** use this artifact for long tiny text unless you independently validate your data; the current mini split score is 33.3%.
    - Math-heavy and table/layout-heavy mini examples looked good in this slice, but full olmOCR-bench is still pending.

    ## Provenance

    Generated non-destructively from the original Hugging Face checkpoint. This is not a fine-tune. The goal of publishing this artifact now is transparency: the files are usable for the passing workload slices above, and the known failing slices are documented clearly.

Downloads last month: 32

Safetensors

Model size

0.3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for Reza2kn/surya-ocr-2-mlx-8bit-g64

Base model

datalab-to/surya-ocr-2

Quantized

(5)

this model