Edit model card

Model Card for Raidium MQG model

The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation".

Paper: https://arxiv.org/abs/2405.14654

MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with BioMedLM, then further pre-trained on those datasets.

The questions have been generated from prompt containing medical data from the textbooks. They are available here: ECNQA_generated_questions.

MQG is designed to be fine-tuned for Medical Question Answering tasks.

Model Details

Model Description

image/png

In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. We show the benefits of our training strategy on a medical answering question dataset.

Using the model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("raidium/MQG")
model = AutoModelForCausalLM.from_pretrained("raidium/MQG") 
  • Developed by: Raidium
  • Model type: Transformer
  • License: Aopache 2.0
  • Finetuned from model: BioMedLM

Model Sources [optional]

Uses

Direct Use

MQG is trained using next-token-prediction on generated questions. Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks. However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers.

Downstream Use

MQG can be fine-tuned for Medical Question Answering tasks. For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers.

Out-of-Scope Use

This model should not be used for datasets outside medical tasks.

Bias, Risks, and Limitations

There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care.

Training Details

Training Data

The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: ECNQA_generated_questions.

Training Procedure

MGQ is trained using next-token-prediction on both datasets.

Training Hyperparameters

  • Training regime: fp16 mixed-precision training.

Evaluation

Testing Data, Factors & Metrics

Testing Data

We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions). It is a multiple-choice question dataset, containing 5 propositions for each question.

Metrics

We use the accuracy to evaluate the model on Medical Question Answering.

Results

See paper: https://arxiv.org/abs/2405.14654

Model Architecture and Objective

The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture.

Compute Infrastructure

Hardware

The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus.

Software

Pytorch, DeepSpeed

Citation

BibTeX:

@article{khlaut2024efficient,
  title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation},
  author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre},
  journal={Clinical NLP Workshop, NAACL 2024},
  year={2024}
}

Model Card Contact

julien.khlaut at raidium.fr

Downloads last month
5
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train raidium/MQG