--- license: apache-2.0 datasets: - HuanjinYao/Mulberry-SFT base_model: - Qwen/Qwen2-VL-7B-Instruct pipeline_tag: image-text-to-text library_name: transformers --- # R1-VL-7B R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO). ### Paper: https://arxiv.org/pdf/2503.12937 ### Github: https://github.com/jingyi0000/R1-VL ### Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct