Xidong commited on
Commit
cfbfda6
β€’
1 Parent(s): 3e45d68

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Multilingual Medicine: Model, Dataset, Benchmark, Code
5
+
6
+ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
7
+
8
+
9
+ <p align="center">
10
+ πŸ‘¨πŸ»β€πŸ’»<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> β€’πŸ“ƒ <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> β€’ 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
11
+ <br> <a href="./README_zh.md"> δΈ­ζ–‡ </a> | <a href="./README.md"> English
12
+ </p>
13
+
14
+ ![Apollo](assets/apollo_medium_final.png)
15
+
16
+ ## 🌈 Update
17
+
18
+ * **[2024.03.07]** [Paper](https://arxiv.org/abs/2403.03640) released.
19
+ * **[2024.02.12]** <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> and <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> is publishedοΌπŸŽ‰
20
+ * **[2024.01.23]** Apollo repo is publishedοΌπŸŽ‰
21
+
22
+
23
+ ## Results
24
+ πŸ€—<a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a> πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-34B" target="_blank">Apollo-34B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-72B" target="_blank">Apollo-72B</a>
25
+
26
+ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B-GGUF" target="_blank">Apollo-0.5B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B-GGUF" target="_blank">Apollo-2B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B-GGUF" target="_blank">Apollo-6B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B-GGUF" target="_blank">Apollo-7B-GGUF</a>
27
+
28
+
29
+ ![Apollo](assets/result.png)
30
+
31
+
32
+
33
+
34
+
35
+ ## Dataset & Evaluation
36
+
37
+ - Dataset
38
+ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>
39
+
40
+ <details><summary>Click to expand</summary>
41
+
42
+ ![Apollo](assets/dataset.png)
43
+
44
+ - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus/blob/main/ApolloCorpus.zip)
45
+ - [Data category](https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus/tree/main/train)
46
+ - Pretrain:
47
+ - data item:
48
+ - json_name: {data_source}_{language}_{data_type}.json
49
+ - data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
50
+ - language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
51
+ - data_type: qa(generated qa from text)
52
+ - data_type==text: list of string
53
+ ```
54
+ [
55
+ "string1",
56
+ "string2",
57
+ ...
58
+ ]
59
+ ```
60
+ - data_type==qa: list of qa pairs(list of string)
61
+ ```
62
+ [
63
+ [
64
+ "q1",
65
+ "a1",
66
+ "q2",
67
+ "a2",
68
+ ...
69
+ ],
70
+ ...
71
+ ]
72
+ ```
73
+ - SFT:
74
+ - json_name: {data_source}_{language}.json
75
+ - data_type: code, general, math, medicalExam, medicalPatient
76
+ - data item: list of qa pairs(list of string)
77
+ ```
78
+ [
79
+ [
80
+ "q1",
81
+ "a1",
82
+ "q2",
83
+ "a2",
84
+ ...
85
+ ],
86
+ ...
87
+ ]
88
+ ```
89
+
90
+
91
+ </details>
92
+
93
+
94
+
95
+ - Evaluation
96
+ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
97
+
98
+ <details><summary>Click to expand</summary>
99
+
100
+ - EN:
101
+ - [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
102
+ - [MedMCQA](https://huggingface.co/datasets/medmcqa/viewer/default/test)
103
+ - [PubMedQA](https://huggingface.co/datasets/pubmed_qa): Because the results fluctuated too much, they were not used in the paper.
104
+ - [MMLU-Medical](https://huggingface.co/datasets/cais/mmlu)
105
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
106
+ - ZH:
107
+ - [MedQA-MCMLE](https://huggingface.co/datasets/bigbio/med_qa/viewer/med_qa_zh_4options_bigbio_qa/test)
108
+ - [CMB-single](https://huggingface.co/datasets/FreedomIntelligence/CMB): Not used in the paper
109
+ - Randomly sample 2,000 multiple-choice questions with single answer.
110
+ - [CMMLU-Medical](https://huggingface.co/datasets/haonan-li/cmmlu)
111
+ - Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
112
+ - [CExam](https://github.com/williamliujl/CMExam): Not used in the paper
113
+ - Randomly sample 2,000 multiple-choice questions
114
+
115
+
116
+ - ES: [Head_qa](https://huggingface.co/datasets/head_qa)
117
+ - FR: [Frenchmedmcqa](https://github.com/qanastek/FrenchMedMCQA)
118
+ - HI: [MMLU_HI](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Arabic)
119
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
120
+ - AR: [MMLU_Ara](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Hindi)
121
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
122
+
123
+
124
+ </details>
125
+
126
+
127
+ ## Results reproduction
128
+ <details><summary>Click to expand</summary>
129
+
130
+ **Waiting for Update**
131
+
132
+
133
+
134
+ </details>
135
+
136
+
137
+
138
+
139
+ ## Citation
140
+ Please use the following citation if you intend to use our dataset for training or evaluation:
141
+
142
+ ```
143
+ @misc{wang2024apollo,
144
+ title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
145
+ author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
146
+ year={2024},
147
+ eprint={2403.03640},
148
+ archivePrefix={arXiv},
149
+ primaryClass={cs.CL}
150
+ }
151
+ ```