nielsr's picture
nielsr HF staff
Add model card
12391d8 verified
|
raw
history blame
4.37 kB
metadata
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: image-text-to-text

Introduction

Paper: Paper,

Github: Github,

Page: Page,

SFT Dataset: OmniAlign-V,

DPO Dataset: OmniAlign-V-DPO,

MM-AlignBench: MM-AlignBench

Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO

This is the official repo of LLaVANext-OmniAlign(OA)-32B in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

LLaVANext-OmniAlign-32B is based on LLaVA-Next structure with Qwen2.5-32B-Instruct.

By combining LLaVA-Next-SFT-738k-multimodal and OmniAlign-V datasets, we can significantly improve the alignment of MLLMs with human preference and enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.

Performance

By integrating OmniAlign-V datasets in Supervised Fine-tuning(SFT) stage, we can not only significantly improve the alignment of MLLMs with human preference, but also enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.

Model Data LLM MM-AlignBench WildVision MIA-Bench MMVet MMMU MMBenchV1.1 AI2D OCRBench
LLaVA LLaVANext-778k InternLM2.5-7B 3.6 / -82.1 18.4 / -55.1 75.4 41.2 42.6 73.6 74.1 39.7
LLaVA OmniAlign-V_mix InternLM2.5-7B 50.0 / +3.8 28.2 / -34.6 85.4 43.5 43.3 73.7 74.7 41.3
+ 46.4 / 85.9 + 9.8 / 20.5 + 10.0 + 2.3 + 0.7 + 0.1 + 0.6 + 1.6
LLaVANext LLaVANext-778k InternLM2.5-7B 20.6 / -42.7 23.4 / -45.0 76.9 41.8 44.1 75.1 74.7 56.2
LLaVANext OmniAlign-V_mix InternLM2.5-7B 57.1 / +11.1 29.6 / -31.3 86.7 47.7 46.8 74.9 77.5 58.9
+ 36.5 / 53.8 + 6.2 / 13.7 + 9.8 + 5.9 + 2.7 - 0.2 + 2.8 + 2.7
LLaVANext LLaVANext-778k Qwen2.5-32B 26.6 / -29.0 25.2 / -41.3 86.0 47.7 55.2 79.3 79.6 55.9
LLaVANext OmniAlign-V_mix Qwen2.5-32B 62.3 / +19.4 40.2 / -14.9 89.6 56.9 60.7 80.6 81.7 55.9
+ 35.7 / 48.4 + 15.0/26.4 + 3.6 + 9.2 + 5.5 + 1.3 + 2.1 + 0.0

For MM-AlignBench and WildVision, A/B denotes Winning Rate/Reward.

How to use

Please refer to our Github for more details about training and evaluation.