Model Card for MedGENIE-fid-flan-t5-base-medmcqa

MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical field. Specifically, MedGENIE-fid-flan-t5-base-medmcqa is a fusion-in-decoder (FID) model based on flan-t5-base, trained on the MedMCQA dataset and grounded on artificial contexts generated by PMC-LLaMA-13B. This model achieves performance levels comparable to state-of-the-art (SOTA) larger models on both MedMCQA and MMLU-Medical benchmarks.

Model description

Language(s) (NLP): English
License: MIT
Finetuned from model: google/flan-t5-base
Repository: https://github.com/disi-unibo-nlp/medgenie
Paper: To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering

Performance

At the time of release (February 2024), MedGENIE-fid-flan-t5-base-medmcqa outcompetes many fine-tuned and few-shot versions of 7B models on MedMCQA. Moreover, it emerges as the leading model on MMLU-Medical, a compilation of 9 medical subsets from MMLU, following Zephyr-β (7B) augmented with MedWiki.

Model	Ground (Source)	Learning	Params	MedMCQA	MMLU-medical	AVG (↓)
MEDITRON (Chen et al.)	∅	Fine-tuned	7B	59.2	55.6	57.4
VOD (Liévin et al. 2023)	R (MedWiki)	Fine-tuned	220M	58.3	56.8	57.6
Zephyr-β	R (MedWiki)	2-shot	7B	47.0	66.7	56.9
MedGENIE-FID-Flan-T5	G (PMC-LLaMA)	Fine-tuned	250M	52.1	59.9	56.0
PMC-LLaMA (Chen et al.)	∅	Fine-tuned	7B	51.4	59.7	55.6
LLaMA-2 (Chen et al.)	∅	Fine-tuned	7B	54.4	56.3	55.4
Zephyr-β (Chen et al.)	∅	2-shot	7B	43.4	60.7	52.1
Mistral-Instruct	R (MedWiki)	2-shot	7B	44.3	58.5	51.4
Mistral-Instruct (Chen et al.)	∅	3-shot	7B	40.2	55.8	48.0
LLaMA-2-chat	∅	2-shot	7B	35.0	49.3	42.2
LLaMA-2-chat	R (MedWiki)	2-shot	7B	37.2	52.0	44.6

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
n_context: 5
per_gpu_batch_size: 2
accumulation_steps: 2
total_steps: 182,816
eval_freq: 22,852
optimizer: AdamW
scheduler: linear
weight_decay: 0.01
warmup_ratio: 0.1
text_maxlength: 600

Bias, Risk and Limitation

Our model is trained on artificially generated contextual documents, which might inadvertently magnify inherent biases and depart from clinical and societal norms. This could lead to the spread of convincing medical misinformation. To mitigate this risk, we recommend a cautious approach: domain experts should manually review any output before real-world use. This ethical safeguard is crucial to prevent the dissemination of potentially erroneous or misleading information, particularly within clinical and scientific circles.

Citation

If you find MedGENIE-fid-flan-t5-base-medmcqa is useful in your work, please cite it with:

@misc{frisoni2024generate,
      title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, 
      author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng},
      year={2024},
      eprint={2403.01924},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

disi-unibo-nlp
/

MedGENIE-fid-flan-t5-base-medmcqa

Model Card for MedGENIE-fid-flan-t5-base-medmcqa

Model description

Performance

Training hyperparameters

Bias, Risk and Limitation

Citation

Dataset used to train disi-unibo-nlp/MedGENIE-fid-flan-t5-base-medmcqa

Collection including disi-unibo-nlp/MedGENIE-fid-flan-t5-base-medmcqa

MedGENIE