Papers
arxiv:2407.05131

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Published on Jul 6
ยท Submitted by richardxp888 on Jul 8
Authors:
,
,
,
,

Abstract

The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on three medical VQA datasets, achieving an average improvement of 20.8% in factual accuracy. We publicly release our benchmark and code in https://github.com/richard-peng-xia/RULE.

Community

Paper author Paper submitter

๐Ÿ”ฅ Enhanced Factual Accuracy: The proposed RULE framework significantly improves factual accuracy in Medical Large Vision Language Models (Med-LVLMs), achieving an average improvement of 20.8% across three medical VQA datasets.

๐Ÿ”ฅ Innovative Approach: RULE introduces a novel, provably effective strategy to control factuality risk by calibrating the selection of retrieved contexts, addressing the challenge of limited or excessive retrieval.

๐Ÿ”ฅ Balanced Dependence: By curating a preference dataset based on instances of over-reliance on retrieved contexts, RULE fine-tunes the model to balance its dependence on inherent knowledge and retrieved information, reducing the risk of incorrect answers.

๐Ÿ”ฅ Practical Application: The RULE framework offers a practical solution for enhancing the factual accuracy of Med-LVLMs, providing a transparent and efficient approach to integrating external knowledge without compromising the model's inherent capabilities.

Paper author Paper submitter

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.05131 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.05131 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.05131 in a Space README.md to link it from this page.

Collections including this paper 1