updated readme
Browse files
README.md
CHANGED
@@ -1,133 +1,9 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
# medAlpaca: Finetuned Large Language Models for Medical Question Answering
|
4 |
-
|
5 |
-
## Project Overview
|
6 |
-
MedAlpaca expands upon both [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and
|
7 |
-
[AlpacaLoRA](https://github.com/tloen/alpaca-lora) to offer an advanced suite of large language
|
8 |
-
models specifically fine-tuned for medical question-answering and dialogue applications.
|
9 |
-
Our primary objective is to deliver an array of open-source language models, paving the way for
|
10 |
-
seamless development of medical chatbot solutions.
|
11 |
-
|
12 |
-
These models have been trained using a variety of medical texts, encompassing resources such as
|
13 |
-
medical flashcards, wikis, and dialogue datasets. For more details on the data utilized, please consult the data section.
|
14 |
-
|
15 |
-
## Getting Started
|
16 |
-
Create a new virtual environment, e.g. with conda
|
17 |
-
|
18 |
-
```bash
|
19 |
-
conda create -n medalpaca python>=3.9
|
20 |
-
```
|
21 |
-
|
22 |
-
Install the required packages:
|
23 |
-
```bash
|
24 |
-
pip install -r requirements.txt
|
25 |
-
```
|
26 |
-
|
27 |
-
## Training of medAlpaca
|
28 |
-
<img width="256" alt="training your alpaca" src="https://user-images.githubusercontent.com/37253540/229250535-98f28e1c-0a8e-46e7-9e61-aeb98ef115cc.png">
|
29 |
-
|
30 |
-
### Memory Requirements
|
31 |
-
We have benchmarked the needed GPU memory as well as the approximate duration per epoch
|
32 |
-
for finetuning LLaMA 7b on the Medical Meadow small dataset (~6000 Q/A pairs) on a single GPU:
|
33 |
-
|
34 |
-
|
35 |
-
| Model | 8bit trainig | LoRA | fp16 | bf16 | VRAM Used | Gradient cktp | Duration/epoch |
|
36 |
-
|----------|--------------|-------|-------|-------|-----------|---------------|----------------|
|
37 |
-
| LLaMA 7b | True | True | True | False | 8.9 GB | False | 77:30 |
|
38 |
-
| LLaMA 7b | False | True | True | False | 18.8 GB | False | 14:30 |
|
39 |
-
| LLaMA 7b | False | False | True | False | OOM | False | - |
|
40 |
-
| LLaMA 7b | False | False | False | True | 79.5 GB | True | 35:30 |
|
41 |
-
| LLaMA 7b | False | False | False | False | OOM | True | - |
|
42 |
-
|
43 |
-
### Train medAlpaca based on LLaMA
|
44 |
-
If you have access to the [LLaMA](https://arxiv.org/abs/2302.13971) or [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html)
|
45 |
-
weights you can finetune the model with the following command.
|
46 |
-
Just replace `<PATH_TO_LLAMA_WEIGHTS>` with the folder containing you LLaMA or Alpaca weights.
|
47 |
-
|
48 |
-
```bash
|
49 |
-
python medalpaca/train.py \
|
50 |
-
--model PATH_TO_LLAMA_WEIGHTS \
|
51 |
-
--data_path medical_meadow_small.json \
|
52 |
-
--output_dir 'output' \
|
53 |
-
--train_in_8bit True \
|
54 |
-
--bf16 True \
|
55 |
-
--tf32 False \
|
56 |
-
--fp16 False \
|
57 |
-
--global_batch_size 128 \
|
58 |
-
--per_device_batch_size 8 \
|
59 |
-
```
|
60 |
-
Per default the script performs mixed precision training.
|
61 |
-
You can toggle 8bit training with the `train_in_8bit` flag.
|
62 |
-
While 8 bit training currently only works with `use_lora True`, however you can use
|
63 |
-
LoRA without 8 bit training.
|
64 |
-
It is also able to train other models such as `facebook/opt-6.7` with the above script.
|
65 |
-
|
66 |
-
## Data
|
67 |
-
<img width="256" alt="Screenshot 2023-03-31 at 09 37 41" src="https://user-images.githubusercontent.com/37253540/229244284-72b00e82-0da1-4218-b08e-63864306631e.png">
|
68 |
-
|
69 |
-
To ensure your cherished llamas and alpacas are well-fed and thriving,
|
70 |
-
we have diligently gathered high-quality biomedical open-source datasets
|
71 |
-
and transformed them into instruction tuning formats.
|
72 |
-
We have dubbed this endeavor **Medical Meadow**.
|
73 |
-
Medical Meadow currently encompasses roughly 1.5 million data points across a diverse range of tasks,
|
74 |
-
including openly curated medical data transformed into Q/A pairs with OpenAI's `gpt-3.5-turbo`
|
75 |
-
and a collection of established NLP tasks in the medical domain.
|
76 |
-
Please note, that not all data is of the same quantitiy and quality and you may need tp subsample
|
77 |
-
the data for training your own model.
|
78 |
-
We will persistently update and refine the dataset, and we welcome everyone to contribute more 'grass' to Medical Meadow!
|
79 |
-
|
80 |
-
### Data Overview
|
81 |
-
|
82 |
-
| Name | Source | n | n included in training |
|
83 |
-
|----------------------|-------------------------------------------------------------------------|----------|-------------------------|
|
84 |
-
| Medical Flashcards | [medalpaca/medical_meadow_medical_flashcards](https://huggingface.co/datasets/medalpaca/medical_meadow_medical_flashcards) | 33955 | 33955 |
|
85 |
-
| Wikidoc | [medalpaca/medical_meadow_wikidoc](https://huggingface.co/datasets/medalpaca/medical_meadow_wikidoc) | 67704 | 10000 |
|
86 |
-
| Wikidoc Patient Information | [medalpaca/medical_meadow_wikidoc_patient_information](https://huggingface.co/datasets/medalpaca/medical_meadow_wikidoc_patient_information) | 5942 | 5942 |
|
87 |
-
| Stackexchange academia | [medalpaca/medical_meadow_stack_exchange](https://huggingface.co/medalpaca/datasets/medalpaca/medical_meadow_stackexchange) | 40865 | 40865 |
|
88 |
-
| Stackexchange biology | [medalpaca/medical_meadow_stack_exchange](https://huggingface.co/medalpaca/datasets/medalpaca/medical_meadow_stackexchange) | 27887 | 27887 |
|
89 |
-
| Stackexchange fitness | [medalpaca/medical_meadow_stack_exchange](https://huggingface.co/medalpaca/datasets/medalpaca/medical_meadow_stackexchange) | 9833 | 9833 |
|
90 |
-
| Stackexchange health | [medalpaca/medical_meadow_stack_exchange](https://huggingface.co/medalpaca/datasets/medalpaca/medical_meadow_stackexchange) | 7721 | 7721 |
|
91 |
-
| Stackexchange bioinformatics | [medalpaca/medical_meadow_stack_exchange](https://huggingface.co/datasets/medalpaca/medical_meadow_stackexchange) | 5407 | 5407 |
|
92 |
-
| USMLE Self Assessment Step 1 | [medalpaca/medical_meadow_usmle_self](https://huggingface.co/datasets/medalpaca/medical_meadow_usmle_self_assessment) | 119 | 92 (test only) |
|
93 |
-
| USMLE Self Assessment Step 2 | [medalpaca/medical_meadow_usmle_self](https://huggingface.co/datasets/medalpaca/medical_meadow_usmle_self_assessment) | 120 | 110 (test only) |
|
94 |
-
| USMLE Self Assessment Step 3 | [medalpaca/medical_meadow_usmle_self](https://huggingface.co/datasets/medalpaca/medical_meadow_usmle_self_assessment) | 135 | 122 (test only) |
|
95 |
-
| MEDIQA | [original](https://osf.io/fyg46/?view_only=), [preprocessed](https://huggingface.co/datasets/medalpaca/medical_meadow_mediqa) | 2208 | 2208 |
|
96 |
-
| CORD-19 | [original](https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge ), [preprocessed](https://huggingface.co/datasets/medalpaca/medical_meadow_cord19) | 1056660 | 50000 |
|
97 |
-
| MMMLU | [original](https://github.com/hendrycks/test), [preprocessed](https://huggingface.co/datasets/medalpaca/medical_meadow_mmmlu) | 3787 | 3787 |
|
98 |
-
| Pubmed Health Advice | [original](https://aclanthology.org/D19-1473/), [preprocessed](vhuggingface.co/datasets/medalpaca/health_advice) | 10178 | 10178 |
|
99 |
-
| Pubmed Causal | [original](https://aclanthology.org/2020.coling-main.427/ ), [preprocessed](https://huggingface.co/datasets/medalpaca/medical_meadow_pubmed_causal) | 2446 | 2446 |
|
100 |
-
| ChatDoctor | [original](https://github.com/Kent0n-Li/ChatDoctor ) | 215000 | 10000 |
|
101 |
-
| OpenAssistant | [original](https://huggingface.co/OpenAssistant) | 9209 | 9209 |
|
102 |
|
103 |
|
104 |
### Data description
|
105 |
please refer to [DATA_DESCRIPTION.md](DATA_DESCRIPTION.md)
|
106 |
|
107 |
-
|
108 |
-
## Benchmarks
|
109 |
-
<img width="256" alt="benchmarks" src="https://user-images.githubusercontent.com/37253540/229249302-20ff8a88-95b4-42a3-bdd8-96a9dce9a92b.png">
|
110 |
-
|
111 |
-
We are benchmarking all models on the USMLE self assessment, which is available at this [link](https://www.usmle.org/prepare-your-exam).
|
112 |
-
Note, that we removed all questions with images, as our models are not multimodal.
|
113 |
-
|
114 |
-
| **Model** | **Step1** | **Step2** | **Step3** |
|
115 |
-
|--------------------------------------------------------------------------------------------|-------------------|------------------|------------------|
|
116 |
-
| [LLaMA 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) | 0.198 | 0.202 | 0.203 |
|
117 |
-
| [Alpaca 7b naive](https://github.com/tatsu-lab/stanford_alpaca) ([weights](https://huggingface.co/chavinlo/alpaca-native)) | 0.275 | 0.266 | 0.293 |
|
118 |
-
| [Alpaca 7b LoRA](https://github.com/tloen/alpaca-lora) | 0.220 | 0.138 | 0.252 |
|
119 |
-
| [MedAlpaca 7b](https://huggingface.co/medalpaca/medalpaca-7b) | 0.297 | 0.312 | 0.398 |
|
120 |
-
| [MedAlpaca 7b LoRA](https://huggingface.co/medalpaca/medalpaca/medalpaca-lora-7b-16bit) | 0.231 | 0.202 | 0.179 |
|
121 |
-
| [MedAlpaca 7b LoRA 8bit](https://huggingface.co/medalpaca/medalpaca-lora-7b-8bit) | 0.231 | 0.241 | 0.211 |
|
122 |
-
| [ChatDoctor](https://github.com/Kent0n-Li/ChatDoctor) (7b) | 0.187 | 0.185 | 0.148 |
|
123 |
-
| [LLaMA 13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) | 0.222 | 0.248 | 0.276 |
|
124 |
-
| [Alpaca 13b naive](https://huggingface.co/chavinlo/alpaca-13b) | 0.319 | 0.312 | 0.301 |
|
125 |
-
| [MedAlpaca 13b](https://huggingface.co/medalpaca/medalpaca-13b) | ***0.473*** | ***0.477*** | ***0.602*** |
|
126 |
-
| [MedAlpaca 13b LoRA](https://huggingface.co/medalpaca/medalpaca/medalpaca-lora-13b-16bit) | 0.250 | 0.255 | 0.255 |
|
127 |
-
| [MedAlpaca 13b LoRA 8bit](https://huggingface.co/medalpaca/medalpaca-lora-13b-8bit) | 0.189 | 0.303 | 0.289 |
|
128 |
-
| [MedAlpaca 30b](https://huggingface.co/medalpaca/medalpaca-30b) (still training) | TBA | TBA | TBA |
|
129 |
-
| [MedAlpaca 30b LoRA 8bit](https://huggingface.co/medalpaca/medalpaca-lora-30b-8bit) | 0.315 | 0.327 | 0.361 |+
|
130 |
-
|
131 |
We are continuously working on improving the training as well as our evaluation prompts.
|
132 |
Expect this table to change quite a bit.
|
133 |
|
@@ -142,15 +18,3 @@ extensive testing or validation, and their reliability cannot be guaranteed.
|
|
142 |
We kindly ask you to exercise caution when using these models,
|
143 |
and we appreciate your understanding as we continue to explore and develop this innovative technology.
|
144 |
|
145 |
-
|
146 |
-
## Paper
|
147 |
-
<img width="256" alt="chat-lama" src="https://user-images.githubusercontent.com/37253540/229261366-5cce9a60-176a-471b-80fd-ba390539da72.png">
|
148 |
-
|
149 |
-
```
|
150 |
-
@article{han2023medalpaca,
|
151 |
-
title={MedAlpaca--An Open-Source Collection of Medical Conversational AI Models and Training Data},
|
152 |
-
author={Han, Tianyu and Adams, Lisa C and Papaioannou, Jens-Michalis and Grundmann, Paul and Oberhauser, Tom and L{\"o}ser, Alexander and Truhn, Daniel and Bressem, Keno K},
|
153 |
-
journal={arXiv preprint arXiv:2304.08247},
|
154 |
-
year={2023}
|
155 |
-
}
|
156 |
-
```
|
|
|
1 |
+
# Amigo: Finetuned Large Language Models for Medical Question Answering
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
|
4 |
### Data description
|
5 |
please refer to [DATA_DESCRIPTION.md](DATA_DESCRIPTION.md)
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
We are continuously working on improving the training as well as our evaluation prompts.
|
8 |
Expect this table to change quite a bit.
|
9 |
|
|
|
18 |
We kindly ask you to exercise caution when using these models,
|
19 |
and we appreciate your understanding as we continue to explore and develop this innovative technology.
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|