arxiv:2503.07365

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Published on Mar 10

· Submitted by

FanqingM on Mar 11

#3 Paper of the day

Authors:

Fanqing Meng ,

Lingxiao Du ,

,

,

Quanfeng Lu ,

,

Botian Shi ,

Wenhai Wang ,

Junjun He ,

Kaipeng Zhang ,

Ping Luo ,

,

Qiaosheng Zhang ,

Wenqi Shao

Abstract

We present MM-Eureka, a multimodal reasoning model that successfully extends large-scale <PRE_TAG>rule-based reinforcement learning (RL)</POST_TAG> to multimodal reasoning. While rule-based RL has shown remarkable success in improving LLMs' reasoning abilities in text domains, its application to multimodal settings has remained challenging. Our work reproduces key characteristics of text-based RL systems like DeepSeek-R1 in the multimodal space, including steady increases in accuracy reward and response length, and the emergence of reflection behaviors. We demonstrate that both instruction-tuned and pre-trained models can develop strong <PRE_TAG>multimodal reasoning capabilities</POST_TAG> through rule-based RL without supervised fine-tuning, showing superior data efficiency compared to alternative approaches. We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at https://github.com/ModalMinds/MM-EUREKA

View arXiv page View PDF Add to collection

Community

Paper author Paper submitter about 17 hours ago

The R1-Zero moment of multimodal mathematical reasoning is reproduced for the first time, as well as the large scale training

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.07365 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.07365 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.07365 in a Space README.md to link it from this page.

Collections including this paper 6