VRPRM-Qwen3VL-4B

VRPRM-Qwen3VL-4B is a visual process reward model from VRPRM: Process Reward Modeling via Visual Reasoning.

VRPRM is designed to evaluate intermediate reasoning steps for multimodal problems. The model is intended for visual process reward modeling, reasoning-step scoring, and Best-of-N selection for vision-language model outputs.

Model Details

  • Model family: VRPRM
  • Backbone family: Qwen3-VL 4B
  • Serialized architecture: Qwen3VLForConditionalGeneration
  • Model type: qwen3_vl
  • Weights format: sharded safetensors
  • Recommended library: transformers

Training Summary

The VRPRM paper trains the model with a two-stage recipe:

  1. Supervised fine-tuning cold start on high-quality CoT-PRM data. Open-sourced on VRPRM3.6K.
  2. Reinforcement learning scaling on lower-cost non-CoT PRM data.

Intended Use

This model is intended for research on:

  • Visual process reward modeling
  • Multimodal reasoning evaluation
  • Step-level scoring of visual question answering rationales
  • Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

Usage

Load the model with Hugging Face Transformers from the repository root:

from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-Qwen3VL-4B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For the complete inference and evaluation pipeline, use the VRPRM project code.

Citation

@misc{chen2026vrprmprocessrewardmodeling,
      title={VRPRM: Process Reward Modeling via Visual Reasoning}, 
      author={Xinquan Chen and Chongying Yue and Bangwei Liu and Xuhong Wang and Yingchun Wang and Chaochao Lu},
      year={2026},
      eprint={2508.03556},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.03556}, 
}
Downloads last month
36
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including two-tiger/Qwen3-VRPRM-4B

Paper for two-tiger/Qwen3-VRPRM-4B