--- license: apache-2.0 datasets: - raidium/ECNQA_generated_questions library_name: transformers tags: - medical --- # Model Card for Raidium MQG model The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation". Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets. The questions have been generated from prompt containing medical data from the textbooks. They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). MQG is designed to be fine-tuned for Medical Question Answering tasks. ## Model Details ### Model Description ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png) In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. We show the benefits of our training strategy on a medical answering question dataset. The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned. - **Developed by:** Raidium - **Model type:** Transformer - **License:** Aopache 2.0 - **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM) ### Model Sources [optional] - **Repository:** [https://github.com/raidium-med/MQG] - **Paper:** [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) ## Uses ### Direct Use MQG is trained using next-token-prediction on generated questions. Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks. However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers. ### Downstream Use MQG can be fine-tuned for Medical Question Answering tasks. For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers. ### Out-of-Scope Use This model should not be used for datasets outside medical tasks. ## Bias, Risks, and Limitations There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care. ## Training Details ### Training Data The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). ### Training Procedure MGQ is trained using next-token-prediction on both datasets. #### Training Hyperparameters - **Training regime:** fp16 mixed-precision training. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions). It is a multiple-choice question dataset, containing 5 propositions for each question. #### Metrics We use the accuracy to evaluate the model on Medical Question Answering. ### Results See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) ### Model Architecture and Objective The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture. ### Compute Infrastructure #### Hardware The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus. #### Software Pytorch, DeepSpeed ## Citation **BibTeX:** ``` @article{khlaut2024efficient, title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation}, author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre}, journal={Clinical NLP Workshop, NAACL 2024}, year={2024} } ``` ## Model Card Contact julien.khlaut at raidium.fr