satyamt commited on
Commit
e386947
1 Parent(s): 3b0be67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -5
README.md CHANGED
@@ -21,11 +21,34 @@ tags:
21
 
22
  ### Model Evaluation Benchmark
23
 
24
- | Model Name | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
25
- | ------------------ | ----- | --------- | ---- | ---------- | ---------- | ----- |
26
- | Orca-2-7b | **78.4** | 76.1 | 53.7 | **52.4** | **74.2** | **47.2** |
27
- | LLAMA-2-7b | 43.2 | **77.1** | 44.4 | 38.7 | 69.5 | 16 |
28
- | MT7Bi (1 epoch) | 50.94 | 73.24 | - | 43.04 | 72.06 | 22.52 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ### ARC: 50.94%
31
  | Task |Version| Metric | Value | |Stderr|
 
21
 
22
  ### Model Evaluation Benchmark
23
 
24
+ | | | | | | | | | |
25
+ | -------- | ------ |----- |----- |----- |----- |----- |----- |------ |
26
+ |Category | MT7Bi | meditron-70b | llama-2-70b | med42-70b* | meditron-7b | llama-2-7b | PMC-llama-7b |
27
+ |Health | | 81.8 | 69.1 | 83.6 | 27.3 | 16.4 | 3.6 |
28
+ |Nutrition | | 77.9 | 68.8 | 62.5 | 31.1 | 12.5 | 6.3 |
29
+ |Psychology| | 47.4 | 36.8 | 52.6 | 21.1 | 10.5 | 0.0 |
30
+ |Science | | 77.8 | 44.4 | 33.3 | 33.3 | 11.1 | 0.0 |
31
+ |Avg | | 71.2 | 54.8 | 58.0 | 28.3 | 12.6 | 2.5 |
32
+ | | | | | | | | |
33
+
34
+
35
+ | | | | | | | |
36
+ | --- | ------ | ------ |----- |----- |----- |----- |
37
+ |Dataset| MT7Bi | meditron-70b | llama-2-70b | med42-70b* | clinical-camel-70b* |
38
+ |MMLU-Medical | 46.9 | 77.6 | 77.9 | 74.5 | 65.7 |
39
+ |PubMedQA | 65.2 | 81.6 | 80.0 | 61.2 | 67.0 |
40
+ |MedMCQA | 42.7 | 66.0 | 62.6 | 59.2 | 46.7 |
41
+ |MedQA | | 64.4 | 61.5 | 59.1 | 50.8 |
42
+ |MedQA-4-Option| 44.3 | 70.2 | 63.8 | 63.9 | 56.8 |
43
+ |Avg | | 72.0 | 69.2 | 63.6 | 57.4 |
44
+ | | | | | | | |
45
+
46
+
47
+ | Model Name | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
48
+ | ------------------ | -------- | --------- | ---- | ---------- | ---------- | -------- |
49
+ | Orca-2-7b | **78.4** | 76.1 | 53.7 | **52.4** | **74.2** | **47.2** |
50
+ | LLAMA-2-7b | 43.2 | **77.1** | 44.4 | 38.7 | 69.5 | 16 |
51
+ | MT7Bi (1 epoch) | 50.94 | 73.24 | - | 43.04 | 72.06 | 22.52 |
52
 
53
  ### ARC: 50.94%
54
  | Task |Version| Metric | Value | |Stderr|