MM-ReCoder-SFT-Cold-Start

CVPR 2026  |  Project Page  |  arXiv  |  Code  |  Final RL Model

MM-ReCoder-SFT-Cold-Start is the supervised fine-tuned cold-start checkpoint released alongside the CVPR 2026 paper MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction. It is fine-tuned from Qwen/Qwen2.5-VL-7B-Instruct to bootstrap the chart-to-code and self-correction behaviors before the multi-turn RL stages.

This is an intermediate checkpoint, not the final MM-ReCoder model. If you want the best chart-to-code performance, use cwbc/MM-ReCoder instead. This checkpoint is released for researchers who want to reproduce or ablate the RL stages of the paper.

Intended Use

This checkpoint is intended as the starting point for multi-turn RL training. The pipeline is:

  1. SFT cold-start (this checkpoint) โ€” Qwen2.5-VL-7B-Instruct fine-tuned on chart-to-code demonstrations.
  2. Multi-turn RL (GRPO), stage 1 โ€” shared-first-turn optimization, initialized from this checkpoint.
  3. Multi-turn RL (GRPO), stage 2 โ€” full-trajectory optimization, resumed from stage 1. The result is released as cwbc/MM-ReCoder.

Usage

To kick off RL from this cold-start checkpoint, clone the official repository and run the stage 1 training script (which references this checkpoint via REF_MODEL_PATH=cwbc/MM-ReCoder-SFT-Cold-Start):

git clone https://github.com/ZitianTang/MM-ReCoder.git
cd MM-ReCoder
# Follow the Installation section in the repo README, then launch the
# LLM-as-a-judge reward server (see the RL Training section).

# Stage 1: multi-turn GRPO with a shared first turn.
bash examples/mmrecoder/train/stage1-shared-first-turn.sh

# Stage 2: multi-turn GRPO on the full trajectory, resumed from stage 1.
bash examples/mmrecoder/train/stage2-full-trajectory.sh

Multi-Turn Inference with the Cold-Start Model

This checkpoint also supports the multi-turn self-correction inference loop from the repository โ€” useful for measuring the RL gains over the SFT-only baseline. Reuse the inference scripts and override the model path:

# Download the cold-start checkpoint.
hf download cwbc/MM-ReCoder-SFT-Cold-Start

# Two-turn self-correction on ChartMimic, using the cold-start model.
bash examples/mmrecoder/inference/chartmimic_2turns.sh \
    model.path=cwbc/MM-ReCoder-SFT-Cold-Start \
    data.output_path=generations/coldstart_chartmimic_2turns.json

The self-correction policy is sharpened by the RL stages, so the cold-start model will generally underperform cwbc/MM-ReCoder on multi-turn benchmarks; this is the intended baseline comparison.

Direct single-turn use

You can also load the checkpoint directly with transformers to inspect single-turn chart-to-code behavior:

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
import torch

model_id = "cwbc/MM-ReCoder-SFT-Cold-Start"
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

Citation

@inproceedings{tang2026mmrecoder,
    title={MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction},
    author={Zitian Tang and Xu Zhang and Jianbo Yuan and Yang Zou and Varad Gunjal and Songyao Jiang and Davide Modolo},
    booktitle={CVPR},
    year={2026}
}

License

Released under the Apache 2.0 License, inheriting from the base Qwen2.5-VL-7B-Instruct license.

Downloads last month
68
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cwbc/MM-ReCoder-SFT-Cold-Start

Finetuned
(1106)
this model

Paper for cwbc/MM-ReCoder-SFT-Cold-Start