jingyiZ00
/

R1-VL-7B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

R1-VL-7B / README.md

jingyiZ00's picture

Update README.md

3f150d4 verified 25 days ago

|

history blame contribute delete

495 Bytes

metadata

license: apache-2.0
datasets:
  - HuanjinYao/Mulberry-SFT
base_model:
  - Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers

R1-VL-7B

R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).

Paper: https://arxiv.org/pdf/2503.12937

Github: https://github.com/jingyi0000/R1-VL

Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct