Instructions to use Remidesbois/surya-ocr-2-poneglyph-bbox with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Remidesbois/surya-ocr-2-poneglyph-bbox with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Remidesbois/surya-ocr-2-poneglyph-bbox")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Remidesbois/surya-ocr-2-poneglyph-bbox")
model = AutoModelForMultimodalLM.from_pretrained("Remidesbois/surya-ocr-2-poneglyph-bbox")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Remidesbois/surya-ocr-2-poneglyph-bbox with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Remidesbois/surya-ocr-2-poneglyph-bbox"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Remidesbois/surya-ocr-2-poneglyph-bbox",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Remidesbois/surya-ocr-2-poneglyph-bbox

SGLang

How to use Remidesbois/surya-ocr-2-poneglyph-bbox with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Remidesbois/surya-ocr-2-poneglyph-bbox" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Remidesbois/surya-ocr-2-poneglyph-bbox",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Remidesbois/surya-ocr-2-poneglyph-bbox" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Remidesbois/surya-ocr-2-poneglyph-bbox",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Remidesbois/surya-ocr-2-poneglyph-bbox with Docker Model Runner:
```
docker model run hf.co/Remidesbois/surya-ocr-2-poneglyph-bbox
```

surya-ocr-2-poneglyph-bbox

Surya OCR 2 fine-tuned for One Piece manga bubble text plus bounding boxes

This model reads a full manga page and emits one line per dialogue bubble:

Text content [x1,y1,x2,y2]

Coordinates are normalized to [0, 1000] on the resized page image.

Why Surya For BBox

The upstream Surya OCR 2 card documents bbox-capable outputs in three relevant paths:

OCR output includes per-block polygon, axis-aligned bbox, confidence, and reading order.
surya_detect returns text-line bboxes and polygons.
surya_layout returns layout boxes, labels, reading order, and bbox values.

This fine-tune uses the Hugging Face image-text-to-text Surya OCR 2 model and teaches the generated text stream to match the existing Poneglyph bbox contract.

Benchmark: Surya vs LightOn BBox Poneglyph

Metric	Surya OCR 2 fine-tuned	LightOn bbox Poneglyph	Winner
CER	2.62%	0.64%	LightOn
WER	4.70%	1.80%	LightOn
Mean IoU	92.03%	73.55%	Surya
Median IoU	93.65%	74.43%	Surya
F1 @ IoU=0.5	95.92%	77.71%	Surya
Precision @ 0.5	95.96%	77.31%	Surya
Recall @ 0.5	96.60%	78.68%	Surya
Detection Rate	97.57%	98.85%	LightOn
Combined Score	0.959	0.877	Surya
Avg Inference	9.38s/page	4.62s/page	LightOn

Surya Fine-Tuned Snapshot

Metric	Score
CER	2.62%
WER	4.70%
Mean IoU	92.03%
Median IoU	93.65%
F1 @ IoU=0.3	96.21%
F1 @ IoU=0.5	95.92%
F1 @ IoU=0.75	93.57%
Detection Rate	97.57%
Combined Score	0.959
Avg Inference	9.38s/page

Combined score:

0.4 * (1 - CER) + 0.3 * F1@0.5 + 0.2 * MeanIoU + 0.1 * DetectionRate

Dataset

Source data comes from the Poneglyph Supabase bulles table, filtered to validated annotations, grouped at page level, and split by id_page to prevent page leakage.

Split	Pages	Bubbles
train	599	5415
val	128	1201
test	129	1141

Preprocessing:

Full page image resized to 1540px longest side.
JPEG quality 95.
Bubble boxes normalized to [0, 1000].
Target order follows the stored manga reading order.
Target text uses one strict line per bubble.

How To Use

pip install torch pillow transformers accelerate

import re
import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor

MODEL_ID = "Remidesbois/surya-ocr-2-poneglyph-bbox"
PROMPT = "Extrais le texte des bulles de cette page de manga dans l'ordre de lecture japonais, avec leurs bbox normalisees entre 0 et 1000. Format strict: Texte [x1,y1,x2,y2]."

processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
).eval()

image = Image.open("page.jpg").convert("RGB")
image.thumbnail((1540, 1540), Image.Resampling.LANCZOS)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "page.jpg"},
            {"type": "text", "text": PROMPT},
        ],
    }
]

prompt = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False,
)
inputs = processor(text=[prompt], images=[image], return_tensors="pt")
inputs = {
    k: v.to(model.device, dtype=torch.bfloat16) if v.is_floating_point() else v.to(model.device)
    for k, v in inputs.items()
}

with torch.inference_mode():
    output_ids = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

generated = output_ids[0, inputs["input_ids"].shape[1]:]
text = processor.decode(generated, skip_special_tokens=True).strip()
print(text)

pattern = re.compile(r"(.+?)\s*\[(\d+),(\d+),(\d+),(\d+)\]")
bubbles = [
    {"text": m.group(1).strip(), "bbox": [int(m.group(i)) for i in range(2, 6)]}
    for line in text.splitlines()
    if (m := pattern.match(line.strip()))
]

Training

The training package used for this model lives in:

docker_scripts/finetune_surya_ocr_bbox

Pipeline:

python run_pipeline.py --dry-run --check-remote
python run_pipeline.py

The run exports the dataset, fine-tunes Surya OCR 2 with LoRA/DoRA, benchmarks the held-out test split, benchmarks Remidesbois/LightonOCR-2-1b-poneglyph-bbox on the same pages, writes this README, and uploads the final merged model when HF_TOKEN is available.

Limitations

Domain-specific: trained for One Piece manga pages.
Text language: French annotations.
Output is a generated text contract, so malformed lines are possible and should be parsed defensively.
The model returns normalized bbox coordinates, not pixel coordinates.
The LightOn comparison is only valid when both models are evaluated on the same exported test split.

Base Model

Fine-tuned from datalab-to/surya-ocr-2. The base model uses Surya OCR 2 / Qwen3.5 image-text-to-text architecture.

Fine-tuned by Remidesbois.

Downloads last month: 11

Safetensors

Model size

0.7B params

Tensor type

BF16

Model tree for Remidesbois/surya-ocr-2-poneglyph-bbox

Base model

datalab-to/surya-ocr-2

Finetuned

(2)

this model