PhoenixZ's picture
Upload README.md with huggingface_hub
0b41538 verified

Introduction

Paper: Paper,

Github: Github,

Page: Page,

SFT Dataset: OmniAlign-V,

DPO Dataset: OmniAlign-V-DPO,

MM-AlignBench: MM-AlignBench

Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO

This is the official repo of LLaVANext-OmniAlign(OA)-7B in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

LLaVANext-OmniAlign-7B is based on LLaVA-Next structure with InternLM2.5-7B-chat.

By combining LLaVA-Next-SFT-738k-multimodal and OmniAlign-V datasets, we can significantly improve the alignment of MLLMs with human preference and enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.

Performance

By integrating OmniAlign-V datasets in Supervised Fine-tuning(SFT) stage, we can not only significantly improve the alignment of MLLMs with human preference, but also enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.

Model Data LLM MM-AlignBench WildVision MIA-Bench MMVet MMMU MMBenchV1.1 AI2D OCRBench
LLaVA LLaVANext-778k InternLM2.5-7B 3.6 / -82.1 18.4 / -55.1 75.4 41.2 42.6 73.6 74.1 39.7
LLaVA OmniAlign-V_mix InternLM2.5-7B 50.0 / +3.8 28.2 / -34.6 85.4 43.5 43.3 73.7 74.7 41.3
+ 46.4 / 85.9 + 9.8 / 20.5 + 10.0 + 2.3 + 0.7 + 0.1 + 0.6 + 1.6
LLaVANext LLaVANext-778k InternLM2.5-7B 20.6 / -42.7 23.4 / -45.0 76.9 41.8 44.1 75.1 74.7 56.2
LLaVANext OmniAlign-V_mix InternLM2.5-7B 57.1 / +11.1 29.6 / -31.3 86.7 47.7 46.8 74.9 77.5 58.9
+ 36.5 / 53.8 + 6.2 / 13.7 + 9.8 + 5.9 + 2.7 - 0.2 + 2.8 + 2.7
LLaVANext LLaVANext-778k Qwen2.5-32B 26.6 / -29.0 25.2 / -41.3 86.0 47.7 55.2 79.3 79.6 55.9
LLaVANext OmniAlign-V_mix Qwen2.5-32B 62.3 / +19.4 40.2 / -14.9 89.6 56.9 60.7 80.6 81.7 55.9
+ 35.7 / 48.4 + 15.0/26.4 + 3.6 + 9.2 + 5.5 + 1.3 + 2.1 + 0.0

For MM-AlignBench and WildVision, A/B denotes Winning Rate/Reward.

How to use

Please refer to our Github for more details about training and evaluation.