--- license: apache-2.0 datasets: - raidium/ECNQA_generated_questions library_name: transformers tags: - medical --- # Model Card for Raidium MQG model The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation". Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets. The questions have been generated from prompt containing medical data from the textbooks. They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). MQG is designed to be fine-tuned for Medical Question Answering tasks. ## Model Details ### Model Description ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png) In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. Additionally, we introduce ECN-QA, a novel medical question answering dataset containing ``progressive questions'' composed of related sequential questions. We show the benefits of our training strategy on this dataset. The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned. - **Developed by:** Raidium - **Model type:** Transformer - **License:** Aopache 2.0 - **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM) ### Model Sources [optional] - **Repository:** [https://github.com/raidium-med/MQG] - **Paper:** [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) ## Uses ### Direct Use MQG is trained using next-token-prediction on generated questions. Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks. However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers. ### Downstream Use MQG can be fine-tuned for Medical Question Answering tasks. For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers. ### Out-of-Scope Use This model should not be used for datasets outside medical tasks. ## Bias, Risks, and Limitations There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care. ## Training Details ### Training Data The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). ### Training Procedure MGQ is trained using next-token-prediction on both datasets. #### Training Hyperparameters - **Training regime:** fp16 mixed-precision training. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions). It is a multiple-choice question dataset, containing 5 propositions for each question. #### Metrics We use the accuracy to evaluate the model on Medical Question Answering. ### Results See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) ### Model Architecture and Objective The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture. ### Compute Infrastructure #### Hardware The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus. #### Software Pytorch, DeepSpeed ## Citation **BibTeX:** ``` @article{khlaut2024efficient, title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation}, author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre}, journal={Clinical NLP Workshop, NAACL 2024}, year={2024} } ``` ## Model Card Contact julien.khlaut at raidium.fr