Edit model card

Model Card for RLAIF-V

GitHub | Paper

RLAIF-V-12B is a multimodal large language model (MLLM) that exhibits super GPT-4V trustworthiness. The model is built up on OmniLMM from the MiniCPM-V series.

We utilize a novel framework, RLAIF-V, which aligns MLLMs in a fully open-source paradigm. This framework maximally exploits the open-source feedback from two key perspectives, including high-quality feedback data and an online feedback learning algorithm.

fig1

Model Details

Key Features

  • πŸ… Super GPT-4V Trustworthiness: By learning from open-source AI feedback, RLAIF-V-12B achieves super GPT-4V trustworthiness in both generative and discriminative tasks.
  • πŸ’ͺ Maintaining Well Performance on General Abilities: On benchmarks tested with the general abilities (e.g. LLaVA Bench, MMStar), RLAIF-V-12B also exhibits good performance.

fig1

Examples

fig2-1 fig2-1

Model Description

Usage

Please look at GitHub for more details about usage.

Citation

If you find our model/code/paper helpful, please consider cite our papers πŸ“:

@article{yu2023rlhf,
  title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
  author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
  journal={arXiv preprint arXiv:2312.00849},
  year={2023}
}

@article{yu2024rlaifv,
  title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness}, 
  author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
  journal={arXiv preprint arXiv:2405.17220},
  year={2024},
}
Downloads last month
87
Safetensors
Model size
11.6B params
Tensor type
BF16
Β·
F32
Β·

Dataset used to train openbmb/RLAIF-V-12B

Space using openbmb/RLAIF-V-12B 1

Collection including openbmb/RLAIF-V-12B