PhoenixZ's picture
Upload README.md with huggingface_hub
0b41538 verified
### Introduction
Paper: [Paper](https://arxiv.org/abs/2502.18411),
Github: [Github](https://github.com/PhoenixZ810/OmniAlign-V),
Page: [Page](https://phoenixz810.github.io/OmniAlign-V/),
SFT Dataset: [OmniAlign-V](https://huggingface.co/datasets/PhoenixZ/OmniAlign-V),
DPO Dataset: [OmniAlign-V-DPO](https://huggingface.co/datasets/PhoenixZ/OmniAlign-V-DPO),
MM-AlignBench: [MM-AlignBench](https://github.com/open-compass/VLMEvalKit)
Checkpoints: [LLaVANext-OA-7B](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-7B), [LLaVANext-OA-32B](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-32B), [LLaVANext-OA-32B-DPO](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-32B-DPO)
This is the official repo of LLaVANext-OmniAlign(OA)-7B in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
LLaVANext-OmniAlign-7B is based on [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT) structure with [InternLM2.5-7B-chat](https://huggingface.co/internlm/internlm2_5-7b-chat).
By combining LLaVA-Next-SFT-738k-multimodal and OmniAlign-V datasets, we can significantly improve the alignment of MLLMs with human preference and enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.
### Performance
By integrating OmniAlign-V datasets in Supervised Fine-tuning(SFT) stage, we can not only significantly improve the alignment of MLLMs with human preference, but also enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.
| Model | Data | LLM | MM-AlignBench | WildVision | MIA-Bench | MMVet | MMMU | MMBenchV1.1 | AI2D | OCRBench |
|----------------|---------------------|----------------|---------------------------------|--------------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|
| LLaVA | LLaVANext-778k | InternLM2.5-7B | 3.6 / -82.1 | 18.4 / -55.1 | 75.4 | 41.2 | 42.6 | 73.6 | 74.1 | 39.7 |
| LLaVA | OmniAlign-V_mix | InternLM2.5-7B | 50.0 / +3.8 | 28.2 / -34.6 | 85.4 | 43.5 | 43.3 | 73.7 | 74.7 | 41.3 |
| | | | + 46.4 / 85.9 | + 9.8 / 20.5 | + 10.0 | + 2.3 | + 0.7 | + 0.1 | + 0.6 | + 1.6 |
| LLaVANext | LLaVANext-778k | InternLM2.5-7B | 20.6 / -42.7 | 23.4 / -45.0 | 76.9 | 41.8 | 44.1 | 75.1 | 74.7 | 56.2 |
| LLaVANext | OmniAlign-V_mix | InternLM2.5-7B | 57.1 / +11.1 | 29.6 / -31.3 | 86.7 | 47.7 | 46.8 | 74.9 | 77.5 | 58.9 |
| | | | + 36.5 / 53.8 | + 6.2 / 13.7 | + 9.8 | + 5.9 | + 2.7 | - 0.2 | + 2.8 | + 2.7 |
| LLaVANext | LLaVANext-778k | Qwen2.5-32B | 26.6 / -29.0 | 25.2 / -41.3 | 86.0 | 47.7 | 55.2 | 79.3 | 79.6 | 55.9 |
| LLaVANext | OmniAlign-V_mix | Qwen2.5-32B | 62.3 / +19.4 | 40.2 / -14.9 | 89.6 | 56.9 | 60.7 | 80.6 | 81.7 | 55.9 |
| | | | + 35.7 / 48.4 | + 15.0/26.4 | + 3.6 | + 9.2 | + 5.5 | + 1.3 | + 2.1 | + 0.0 |
For MM-AlignBench and WildVision, A/B denotes Winning Rate/Reward.
### How to use
Please refer to our [Github](https://github.com/PhoenixZ810/OmniAlign-V) for more details about training and evaluation.