Xidong commited on
Commit
33f66d1
1 Parent(s): 39fecc5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +164 -0
README.md CHANGED
@@ -1,3 +1,167 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: text-generation
3
+ language: multilingual
4
  license: apache-2.0
5
+ tags:
6
+ - "Multitask Language Understanding"
7
+ - "Multilingual"
8
+ widget:
9
+ - text: "In traditional Western medicine, which vitamin is commonly recommended to prevent scurvy? A) Vitamin A B) Vitamin B12 C) Vitamin C D) Vitamin D"
10
+ example_title: "English"
11
+ - text: "在中医理论中,以下哪种药材不是治疗风湿病的常用药物? A) 独活 B) 秦艽 C) 甘草 D) 珍珠粉"
12
+ example_title: "Chinese"
13
+ - text: "السؤال:** ما هو العلاج الطبيعي الذي يستخدم تقليديًا في الطب العربي لتحسين الهضم؟ A) الزنجبيل B) النعناع C) القرفة D) الحلبة"
14
+ example_title: "Arabic"
15
+ - text: "आयुर्वेद में, किस औषधि का उपयोग आमतौर पर जुकाम के इलाज के लिए किया जाता है? A) नीम B) तुलसी C) गिलोय D) अश्वगंधा"
16
+ example_title: "Hindi"
17
+ - text: "En la medicina tradicional española, ¿qué alimento se considera beneficioso para la salud del hígado? A) Aceite de oliva B) Tomate C) Foie gras (hígado de ganso) D) Ajo"
18
+ example_title: "Spanish"
19
+ - text: "Dans la tradition médicinale française, quel produit est réputé pour ses bienfaits sur la digestion ? A) Le vin rouge B) Le fromage C) Le foie gras D) Les herbes de Provence"
20
+ example_title: "French"
21
  ---
22
+ # Multilingual Medicine: Model, Dataset, Benchmark, Code
23
+
24
+ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
25
+
26
+
27
+ <p align="center">
28
+ 👨🏻‍💻<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> •📃 <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> • 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
29
+ <br> <a href="./README_zh.md"> 中文 </a> | <a href="./README.md"> English
30
+ </p>
31
+
32
+ ![Apollo](assets/apollo_medium_final.png)
33
+
34
+ ## 🌈 Update
35
+
36
+ * **[2024.03.07]** [Paper](https://arxiv.org/abs/2403.03640) released.
37
+ * **[2024.02.12]** <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> and <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> is published!🎉
38
+ * **[2024.01.23]** Apollo repo is published!🎉
39
+
40
+
41
+ ## Results
42
+ <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a>
43
+
44
+
45
+ ![Apollo](assets/result.png)
46
+
47
+
48
+
49
+
50
+
51
+ ## Dataset & Evaluation
52
+
53
+ - Dataset
54
+ 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>
55
+
56
+ <details><summary>Click to expand</summary>
57
+
58
+ ![Apollo](assets/dataset.png)
59
+
60
+ - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus/blob/main/ApolloCorpus.zip)
61
+ - [Data category](https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus/tree/main/train)
62
+ - Pretrain:
63
+ - data item:
64
+ - json_name: {data_source}_{language}_{data_type}.json
65
+ - data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
66
+ - language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
67
+ - data_type: qa(generated qa from text)
68
+ - data_type==text: list of string
69
+ ```
70
+ [
71
+ "string1",
72
+ "string2",
73
+ ...
74
+ ]
75
+ ```
76
+ - data_type==qa: list of qa pairs(list of string)
77
+ ```
78
+ [
79
+ [
80
+ "q1",
81
+ "a1",
82
+ "q2",
83
+ "a2",
84
+ ...
85
+ ],
86
+ ...
87
+ ]
88
+ ```
89
+ - SFT:
90
+ - json_name: {data_source}_{language}.json
91
+ - data_type: code, general, math, medicalExam, medicalPatient
92
+ - data item: list of qa pairs(list of string)
93
+ ```
94
+ [
95
+ [
96
+ "q1",
97
+ "a1",
98
+ "q2",
99
+ "a2",
100
+ ...
101
+ ],
102
+ ...
103
+ ]
104
+ ```
105
+
106
+
107
+ </details>
108
+
109
+
110
+
111
+ - Evaluation
112
+ 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
113
+
114
+ <details><summary>Click to expand</summary>
115
+
116
+ - EN:
117
+ - [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
118
+ - [MedMCQA](https://huggingface.co/datasets/medmcqa/viewer/default/test)
119
+ - [PubMedQA](https://huggingface.co/datasets/pubmed_qa): Because the results fluctuated too much, they were not used in the paper.
120
+ - [MMLU-Medical](https://huggingface.co/datasets/cais/mmlu)
121
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
122
+ - ZH:
123
+ - [MedQA-MCMLE](https://huggingface.co/datasets/bigbio/med_qa/viewer/med_qa_zh_4options_bigbio_qa/test)
124
+ - [CMB-single](https://huggingface.co/datasets/FreedomIntelligence/CMB): Not used in the paper
125
+ - Randomly sample 2,000 multiple-choice questions with single answer.
126
+ - [CMMLU-Medical](https://huggingface.co/datasets/haonan-li/cmmlu)
127
+ - Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
128
+ - [CExam](https://github.com/williamliujl/CMExam): Not used in the paper
129
+ - Randomly sample 2,000 multiple-choice questions
130
+
131
+
132
+ - ES: [Head_qa](https://huggingface.co/datasets/head_qa)
133
+ - FR: [Frenchmedmcqa](https://github.com/qanastek/FrenchMedMCQA)
134
+ - HI: [MMLU_HI](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Arabic)
135
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
136
+ - AR: [MMLU_Ara](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Hindi)
137
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
138
+
139
+
140
+ </details>
141
+
142
+
143
+ ## Results reproduction
144
+ <details><summary>Click to expand</summary>
145
+
146
+ **Waiting for Update**
147
+
148
+
149
+
150
+ </details>
151
+
152
+
153
+
154
+
155
+ ## Citation
156
+ Please use the following citation if you intend to use our dataset for training or evaluation:
157
+
158
+ ```
159
+ @misc{wang2024apollo,
160
+ title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
161
+ author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
162
+ year={2024},
163
+ eprint={2403.03640},
164
+ archivePrefix={arXiv},
165
+ primaryClass={cs.CL}
166
+ }
167
+ ```