---
license: apache-2.0
datasets:
- raidium/ECNQA_generated_questions
library_name: transformers
tags:
- medical
---

# Model Card for Raidium MQG model


The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation".

Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with 
[BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets.

The questions have been generated from prompt containing medical data from the textbooks. 
They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).

MQG is designed to be fine-tuned for Medical Question Answering tasks.

## Model Details

### Model Description

![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png)


In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. 
Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. 
In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. 
We first fine-tune the model on a corpus of medical textbooks. 
Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. 
Additionally, we introduce ECN-QA, a novel medical question answering dataset containing ``progressive questions'' composed of related sequential questions. 
We show the benefits of our training strategy on this dataset. 
The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned. 


- **Developed by:** Raidium
- **Model type:** Transformer
- **License:** Aopache 2.0
- **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM)

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/raidium-med/MQG]
- **Paper:**  [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

## Uses

### Direct Use

MQG is trained using next-token-prediction on generated questions. 
Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks.
However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers.

### Downstream Use

MQG can be fine-tuned for Medical Question Answering tasks. 
For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers.

### Out-of-Scope Use

This model should not be used for datasets outside medical tasks.

## Bias, Risks, and Limitations

There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care.

## Training Details

### Training Data

The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions:  [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).

### Training Procedure

MGQ is trained using next-token-prediction on both datasets.

#### Training Hyperparameters

- **Training regime:**  fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. 
It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions).
It is a multiple-choice question dataset, containing 5 propositions for each question.

#### Metrics

We use the accuracy to evaluate the model on Medical Question Answering.

### Results

See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

### Model Architecture and Objective

The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture.

### Compute Infrastructure

#### Hardware

The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus. 

#### Software

Pytorch, DeepSpeed

## Citation


**BibTeX:**
```
@article{khlaut2024efficient,
  title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation},
  author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre},
  journal={Clinical NLP Workshop, NAACL 2024},
  year={2024}
}
```

## Model Card Contact

julien.khlaut at raidium.fr