R1-VL-7B / README.md
jingyiZ00's picture
Update README.md
3f150d4 verified
metadata
license: apache-2.0
datasets:
  - HuanjinYao/Mulberry-SFT
base_model:
  - Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers

R1-VL-7B

R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).

Paper: https://arxiv.org/pdf/2503.12937

Github: https://github.com/jingyi0000/R1-VL

Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct