dmis-lab commited on
Commit
cf9b05e
·
verified ·
1 Parent(s): 4a81cd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -161,13 +161,13 @@ We tested models on seven medical benchmarks: [MedQA](https://arxiv.org/abs/2009
161
 
162
  | **Model** | **Average** | **MedQA** | **USMLE** | **Medbullets-4** | **Medbullets-5** | **MedMCQA** | **MMLU-Medical** |
163
  |:--------------------------------|:-----------:|:---------:|:---------:|:----------------:|:----------------:|:-----------:|:----------------:|
164
- | GPT-4 | 75.2 | 81.4 | 86.6 | 68.8 | 63.3 | 72.4 | 87.1 |
165
- | GPT-3.5 | 54.1 | 53.6 | 58.5 | 51.0 | 47.4 | 51.0 | 67.3 |
166
  | MediTron-70B (Ensemble, 5 runs) | - | 70.2 | - | - | - | 66.0 | 78.0 |
167
  |*Open-source (7B)*|
168
- | MediTron-7B | 50.8 | 50.2 | 44.6 | 51.1 | 45.5 | 57.9 | 56.7 |
169
- | BioMistral-7B | 54.4 | 54.3 | 51.4 | 52.3 | 48.7 | 61.1 | 64.6 |
170
- | Meerkat-7B | 62.4 | 70.6 | 70.3 | 58.7 | 52.9 | 60.6 | 70.5 |
171
  | Meerkat-8B (**New**) | **67.3** | **74.0** | **74.2** | **62.3** | **55.5** | **62.7** | **75.2** |
172
 
173
  Please note that the scores in MMLU-Medical were calculated based on the average accuracies across six medical-related subjects in the original MMLU benchmark, and each result for a single subject is presented below.
 
161
 
162
  | **Model** | **Average** | **MedQA** | **USMLE** | **Medbullets-4** | **Medbullets-5** | **MedMCQA** | **MMLU-Medical** |
163
  |:--------------------------------|:-----------:|:---------:|:---------:|:----------------:|:----------------:|:-----------:|:----------------:|
164
+ | GPT-4 | 76.6 | 81.4 | 86.6 | 68.8 | 63.3 | 72.4 | 87.1 |
165
+ | GPT-3.5 | 54.8 | 53.6 | 58.5 | 51.0 | 47.4 | 51.0 | 67.3 |
166
  | MediTron-70B (Ensemble, 5 runs) | - | 70.2 | - | - | - | 66.0 | 78.0 |
167
  |*Open-source (7B)*|
168
+ | MediTron-7B | 51.0 | 50.2 | 44.6 | 51.1 | 45.5 | 57.9 | 56.7 |
169
+ | BioMistral-7B | 55.4 | 54.3 | 51.4 | 52.3 | 48.7 | 61.1 | 64.6 |
170
+ | Meerkat-7B | 62.6 | 70.6 | 70.3 | 58.7 | 52.9 | 60.6 | 70.5 |
171
  | Meerkat-8B (**New**) | **67.3** | **74.0** | **74.2** | **62.3** | **55.5** | **62.7** | **75.2** |
172
 
173
  Please note that the scores in MMLU-Medical were calculated based on the average accuracies across six medical-related subjects in the original MMLU benchmark, and each result for a single subject is presented below.