mao1207/BioMed-VITAL-models

Overview

BioMed-VITAL is a multimodal foundation model specifically tuned for biomedical applications. It leverages visual and textual data to improve understanding and reasoning within the biomedical domain.

Model Training

The training of BioMed-VITAL involved two key stages, both incorporating clinician preferences to ensure the relevance and quality of the training data:

Data Generation: During this stage, the GPT-4V generator was prompted with a diverse set of clinician-selected demonstrations. This approach facilitated the generation of domain-specific, preference-aligned data candidates, tailored to reflect real-world clinical scenarios and preferences.
Data Selection: A separate selection model was trained to explicitly incorporate clinician and policy-guided preferences. This model employed a sophisticated rating function to evaluate and select the highest quality data for further tuning of BioMed-VITAL. This selection process was critical in refining the dataset to ensure that only the most relevant and accurate instructional data was used.

Performance and Evaluation

The effectiveness of BioMed-VITAL was demonstrated through significant improvements in two key areas:

Open Visual Chat: The model showed a relative improvement of 18.5%, indicating enhanced capabilities in engaging in visual dialogues pertinent to biomedical contexts.
Medical Visual Question Answering (VQA): BioMed-VITAL achieved a win rate of up to 81.73% in this domain, showcasing its superior performance in interpreting and responding to complex medical imagery and queries.