MQG / README.md

Create README.md

23955d9 verified about 1 month ago

4.99 kB

	---
	license: apache-2.0
	datasets:
	- raidium/ECNQA_generated_questions
	library_name: transformers
	tags:
	- medical
	---

	# Model Card for Raidium MQG model


	The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation".

	Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

	MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with
	[BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets.

	The questions have been generated from prompt containing medical data from the textbooks.
	They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).

	MQG is designed to be fine-tuned for Medical Question Answering tasks.

	## Model Details

	### Model Description

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png)

	In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain.
	Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind.
	In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach.
	We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model.
	We show the benefits of our training strategy on a medical answering question dataset.
	The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned.


	- Developed by: Raidium
	- Model type: Transformer
	- License: Aopache 2.0
	- Finetuned from model: [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM)

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [https://github.com/raidium-med/MQG]
	- Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

	## Uses

	### Direct Use

	MQG is trained using next-token-prediction on generated questions.
	Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks.
	However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers.

	### Downstream Use

	MQG can be fine-tuned for Medical Question Answering tasks.
	For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers.

	### Out-of-Scope Use

	This model should not be used for datasets outside medical tasks.

	## Bias, Risks, and Limitations

	There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care.

	## Training Details

	### Training Data

	The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).

	### Training Procedure

	MGQ is trained using next-token-prediction on both datasets.

	#### Training Hyperparameters

	- Training regime: fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination.
	It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions).
	It is a multiple-choice question dataset, containing 5 propositions for each question.

	#### Metrics

	We use the accuracy to evaluate the model on Medical Question Answering.

	### Results

	See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

	### Model Architecture and Objective

	The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture.

	### Compute Infrastructure

	#### Hardware

	The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus.

	#### Software

	Pytorch, DeepSpeed

	## Citation


	BibTeX:
	```
	@article{khlaut2024efficient,
	title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation},
	author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre},
	journal={Clinical NLP Workshop, NAACL 2024},
	year={2024}
	}
	```

	## Model Card Contact

	julien.khlaut at raidium.fr