Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Abstract
Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance. To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs. Specifically, (1) on the data side, we design an automated preference data construction pipeline to create MMPR, a high-quality, large-scale multimodal reasoning preference dataset. and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks. Notably, our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B. We hope this study could inspire further advancements in MLLMs. Code, data, and model shall be publicly released.
Community
We introduce MMPR, a high-quality, large-scale multimodal reasoning preference dataset, and MPO, an effective preference optimization algorithm. The resulting model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B. Please refer to our paper, project page and document for more details.
excellent paperπ
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning (2024)
- Improve Vision Language Model Chain-of-thought Reasoning (2024)
- Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback (2024)
- SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization (2024)
- Vision-Language Models Can Self-Improve Reasoning via Reflection (2024)
- MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models (2024)
- MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper