metadata
license: apache-2.0
datasets:
- HuanjinYao/Mulberry-SFT
base_model:
- Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
R1-VL-7B
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).