Create README.md
#3
by
cdancette
- opened
README.md
CHANGED
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- raidium/ECNQA_generated_questions
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- medical
|
8 |
+
base_model: stanford-crfm/BioMedLM
|
9 |
+
---
|
10 |
+
|
11 |
+
# Model Card for Raidium MQG model
|
12 |
+
|
13 |
+
|
14 |
+
The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation".
|
15 |
+
|
16 |
+
Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)
|
17 |
+
|
18 |
+
MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with
|
19 |
+
[BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets.
|
20 |
+
|
21 |
+
The questions have been generated from prompt containing medical data from the textbooks.
|
22 |
+
They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).
|
23 |
+
|
24 |
+
MQG is designed to be fine-tuned for Medical Question Answering tasks.
|
25 |
+
|
26 |
+
## Model Details
|
27 |
+
|
28 |
+
### Model Description
|
29 |
+
|
30 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png)
|
31 |
+
|
32 |
+
In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain.
|
33 |
+
Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind.
|
34 |
+
In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach.
|
35 |
+
We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model.
|
36 |
+
We show the benefits of our training strategy on a medical answering question dataset.
|
37 |
+
The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned.
|
38 |
+
|
39 |
+
|
40 |
+
- **Developed by:** Raidium
|
41 |
+
- **Model type:** Transformer
|
42 |
+
- **License:** Aopache 2.0
|
43 |
+
- **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM)
|
44 |
+
|
45 |
+
### Model Sources [optional]
|
46 |
+
|
47 |
+
<!-- Provide the basic links for the model. -->
|
48 |
+
|
49 |
+
- **Repository:** [https://github.com/raidium-med/MQG]
|
50 |
+
- **Paper:** [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)
|
51 |
+
|
52 |
+
## Uses
|
53 |
+
|
54 |
+
### Direct Use
|
55 |
+
|
56 |
+
MQG is trained using next-token-prediction on generated questions.
|
57 |
+
Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks.
|
58 |
+
However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers.
|
59 |
+
|
60 |
+
### Downstream Use
|
61 |
+
|
62 |
+
MQG can be fine-tuned for Medical Question Answering tasks.
|
63 |
+
For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers.
|
64 |
+
|
65 |
+
### Out-of-Scope Use
|
66 |
+
|
67 |
+
This model should not be used for datasets outside medical tasks.
|
68 |
+
|
69 |
+
## Bias, Risks, and Limitations
|
70 |
+
|
71 |
+
There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care.
|
72 |
+
|
73 |
+
## Training Details
|
74 |
+
|
75 |
+
### Training Data
|
76 |
+
|
77 |
+
The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).
|
78 |
+
|
79 |
+
### Training Procedure
|
80 |
+
|
81 |
+
MGQ is trained using next-token-prediction on both datasets.
|
82 |
+
|
83 |
+
#### Training Hyperparameters
|
84 |
+
|
85 |
+
- **Training regime:** fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
86 |
+
|
87 |
+
## Evaluation
|
88 |
+
|
89 |
+
### Testing Data, Factors & Metrics
|
90 |
+
|
91 |
+
#### Testing Data
|
92 |
+
|
93 |
+
We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination.
|
94 |
+
It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions).
|
95 |
+
It is a multiple-choice question dataset, containing 5 propositions for each question.
|
96 |
+
|
97 |
+
#### Metrics
|
98 |
+
|
99 |
+
We use the accuracy to evaluate the model on Medical Question Answering.
|
100 |
+
|
101 |
+
### Results
|
102 |
+
|
103 |
+
See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)
|
104 |
+
|
105 |
+
### Model Architecture and Objective
|
106 |
+
|
107 |
+
The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture.
|
108 |
+
|
109 |
+
### Compute Infrastructure
|
110 |
+
|
111 |
+
#### Hardware
|
112 |
+
|
113 |
+
The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus.
|
114 |
+
|
115 |
+
#### Software
|
116 |
+
|
117 |
+
Pytorch, DeepSpeed
|
118 |
+
|
119 |
+
## Citation
|
120 |
+
|
121 |
+
|
122 |
+
**BibTeX:**
|
123 |
+
```
|
124 |
+
@article{khlaut2024efficient,
|
125 |
+
title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation},
|
126 |
+
author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre},
|
127 |
+
journal={Clinical NLP Workshop, NAACL 2024},
|
128 |
+
year={2024}
|
129 |
+
}
|
130 |
+
```
|
131 |
+
|
132 |
+
## Model Card Contact
|
133 |
+
|
134 |
+
julien.khlaut at raidium.fr
|