---
license: apache-2.0
datasets:
- HuanjinYao/Mulberry-SFT
base_model:
- Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
---
# R1-VL-7B

<!-- Provide a quick summary of what the model is/does. -->
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).

### Paper: https://arxiv.org/pdf/2503.12937

### Github: https://github.com/jingyi0000/R1-VL

### Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct