Xidong commited on
Commit
2a71ddf
β€’
1 Parent(s): 91303f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -3
README.md CHANGED
@@ -7,7 +7,7 @@ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
7
 
8
 
9
  <p align="center">
10
- πŸ‘¨πŸ»β€πŸ’»<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> β€’πŸ“ƒ <a href="" target="_blank">Paper</a> β€’ 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
11
  <br> <a href="./README_zh.md"> δΈ­ζ–‡ </a> | <a href="./README.md"> English
12
  </p>
13
 
@@ -32,10 +32,93 @@ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
32
  ## Dataset & Evaluation
33
 
34
  - Dataset
35
- πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
 
37
  - Evaluation
38
- πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
 
41
  ## Results reproduction
 
7
 
8
 
9
  <p align="center">
10
+ πŸ‘¨πŸ»β€πŸ’»<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> β€’πŸ“ƒ <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> β€’ 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
11
  <br> <a href="./README_zh.md"> δΈ­ζ–‡ </a> | <a href="./README.md"> English
12
  </p>
13
 
 
32
  ## Dataset & Evaluation
33
 
34
  - Dataset
35
+ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>
36
+
37
+ <details><summary>Click to expand</summary>
38
+
39
+ ![Apollo](assets/dataset.png)
40
+
41
+ - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip)
42
+ - [Data category](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train)
43
+ - Pretrain:
44
+ - data item:
45
+ - json_name: {data_source}_{language}_{data_type}.json
46
+ - data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
47
+ - language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
48
+ - data_type: qa(generated qa from text)
49
+ - data_type==text: list of string
50
+ ```
51
+ [
52
+ "string1",
53
+ "string2",
54
+ ...
55
+ ]
56
+ ```
57
+ - data_type==qa: list of qa pairs(list of string)
58
+ ```
59
+ [
60
+ [
61
+ "q1",
62
+ "a1",
63
+ "q2",
64
+ "a2",
65
+ ...
66
+ ],
67
+ ...
68
+ ]
69
+ ```
70
+ - SFT:
71
+ - json_name: {data_source}_{language}.json
72
+ - data_type: code, general, math, medicalExam, medicalPatient
73
+ - data item: list of qa pairs(list of string)
74
+ ```
75
+ [
76
+ [
77
+ "q1",
78
+ "a1",
79
+ "q2",
80
+ "a2",
81
+ ...
82
+ ],
83
+ ...
84
+ ]
85
+ ```
86
+
87
+
88
+ </details>
89
+
90
 
91
+
92
  - Evaluation
93
+ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
94
+
95
+ <details><summary>Click to expand</summary>
96
+
97
+ - EN:
98
+ - [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
99
+ - [MedMCQA](https://huggingface.co/datasets/medmcqa/viewer/default/test)
100
+ - [PubMedQA](https://huggingface.co/datasets/pubmed_qa): Because the results fluctuated too much, they were not used in the paper.
101
+ - [MMLU-Medical](https://huggingface.co/datasets/cais/mmlu)
102
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
103
+ - ZH:
104
+ - [MedQA-MCMLE](https://huggingface.co/datasets/bigbio/med_qa/viewer/med_qa_zh_4options_bigbio_qa/test)
105
+ - [CMB-single](https://huggingface.co/datasets/FreedomIntelligence/CMB): Not used in the paper
106
+ - Randomly sample 2,000 multiple-choice questions with single answer.
107
+ - [CMMLU-Medical](https://huggingface.co/datasets/haonan-li/cmmlu)
108
+ - Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
109
+ - [CExam](https://github.com/williamliujl/CMExam): Not used in the paper
110
+ - Randomly sample 2,000 multiple-choice questions
111
+
112
+
113
+ - ES: [Head_qa](https://huggingface.co/datasets/head_qa)
114
+ - FR: [Frenchmedmcqa](https://github.com/qanastek/FrenchMedMCQA)
115
+ - HI: [MMLU_HI](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Arabic)
116
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
117
+ - AR: [MMLU_Ara](https://huggingface.co/datasets/FreedomIntelligence/MMLU_Hindi)
118
+ - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
119
+
120
+
121
+ </details>
122
 
123
 
124
  ## Results reproduction