Qwen3.5-2B-MathParser-pro

Model Summary

Qwen3.5-2B-MathParser-pro is a compact vision-language model for handwritten mathematical formula OCR. It is optimized to transcribe single-line and multi-line handwritten mathematical expressions into LaTeX, with a focus on local deployment.

This 2B release is intended for lower-memory local deployment. The companion release is Qwen3.5-4B-MathParser-pro.

Intended Use

  • Handwritten mathematical formula recognition
  • Multi-line LaTeX transcription
  • OCR for mathematical expressions and derivations
  • Research and application prototyping around handwritten math parsing

This model is not intended to be a general mathematical reasoning model. It should be used as an OCR/transcription model.

Training Recipe

The model follows a two-stage MathParser training recipe:

  1. Stage 1 SFT builds a stable handwritten mathematical OCR base and teaches direct LaTeX transcription.
  2. Stage 2 DPO v34 prefers concise, stable, line-count-faithful transcriptions and reduces malformed outputs, repetition, max-token runaway, and very low-similarity failures.

The released weights are fully merged model weights, not LoRA adapters.

Evaluation

Evaluation set: 756 multi-line handwritten mathematical formula samples.

Metrics:

  • Avg Sim / Median Sim: normalized edit similarity, higher is better.
  • Line Match: exact line-count match with ground truth.
  • Within +/-1: predicted line count differs from ground truth by at most one.
  • Runaway: max-token or obviously overlong/repetitive generations, lower is better.
  • Bad <0.50: samples with similarity below 0.50, lower is better.
Model Samples Avg Sim Median Sim Line Match Within +/-1 Runaway Bad <0.50
Qwen3.5-0.8B Base 756 0.544843 0.580742 149 235 108 262
Qwen3.5-2B Base 756 0.599258 0.651649 252 392 19 236
Qwen3.5-4B Base 756 0.534456 0.541674 264 368 5 295
Qwen3.5-2B SFT 756 0.906516 0.952732 550 706 13 25
Qwen3.5-2B SFT+DPO 756 0.916060 0.951464 569 714 3 15
Qwen3.5-4B SFT 756 0.942045 0.966546 612 730 0 2
Qwen3.5-4B SFT+DPO 756 0.942878 0.968560 611 730 0 1

For this release, the main result is:

Release Avg Sim Median Sim Line Match Within +/-1 Runaway Bad <0.50
Qwen3.5-2B-MathParser-pro 0.916060 0.951464 569 714 3 15

Figures

Overall average similarity

Error reduction

Bucket average similarity

Model size quality tradeoff

Usage

from PIL import Image
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info

model_id = "sugartai/Qwen3.5-2B-MathParser-pro"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map="auto",
).eval()

image = Image.open("formula.png").convert("RGB")
messages = [
    {
        "role": "system",
        "content": "You are a handwritten mathematical OCR model. Return only the LaTeX transcription.",
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Transcribe the handwritten mathematical formula into LaTeX only."},
        ],
    },
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

eos_ids = [processor.tokenizer.eos_token_id]
pad_id = processor.tokenizer.pad_token_id
if pad_id is not None and pad_id not in eos_ids:
    eos_ids.append(pad_id)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=1536,
        do_sample=False,
        num_beams=1,
        eos_token_id=eos_ids,
        pad_token_id=pad_id if pad_id is not None else eos_ids[0],
    )

new_ids = output_ids[:, inputs["input_ids"].shape[1]:]
print(processor.decode(new_ids[0], skip_special_tokens=True))

Limitations

  • The model is specialized for handwritten mathematical OCR and LaTeX transcription.
  • It is not a general reasoning or theorem-proving model.
  • Very noisy images, unusual notation, extreme layout variation, or out-of-distribution handwriting may degrade quality.
  • The reported metrics are from an internal 756-sample multi-line handwritten formula evaluation set.

License

This model is released under Apache 2.0, following the base model license of Qwen/Qwen3.5-2B.

Citation

If you use this model, please cite or link this model page and the Qwen3.5 base model.

Downloads last month
18
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sugartai/Qwen3.5-2B-MathParser-pro

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(190)
this model