File size: 7,157 Bytes
0dcab31
 
 
 
 
 
 
 
 
 
 
c4c87ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
datasets:
- xzuyn/chatdoctor-200k-stripped
- Technoculture/riddle_sense
- axiong/pmc_llama_instructions
- Open-Orca/SlimOrca-Dedup
language:
- en
tags:
- medical
---
[Technoculture/MD7b-alpha](https://huggingface.co/Technoculture/MD7b-alpha) adapter merged with its Base Model (Meditron 7B)

---

|                       Model                       | ARC |HellaSwag|           MMLU           |TruthfulQA|Winogrande|GSM8K|
|---------------------------------------------------|----:|--------:|--------------------------|---------:|---------:|----:|
|[MT7Bi](https://huggingface.co/Technoculture/MT7Bi)|50.94|    73.24|Error: File does not exist|     43.04|     72.06|22.52|

### ARC
|    Task     |Version|       Metric       |    Value    |   |Stderr|
|-------------|-------|--------------------|-------------|---|------|
|arc_challenge|Yaml   |acc,none            |         0.48|   |      |
|             |       |acc_stderr,none     |         0.01|   |      |
|             |       |acc_norm,none       |         0.51|   |      |
|             |       |acc_norm_stderr,none|         0.01|   |      |
|             |       |alias               |arc_challenge|   |      |

Average: 50.94%

### HellaSwag
|  Task   |Version|       Metric       |  Value  |   |Stderr|
|---------|-------|--------------------|---------|---|------|
|hellaswag|Yaml   |acc,none            |     0.54|   |      |
|         |       |acc_stderr,none     |        0|   |      |
|         |       |acc_norm,none       |     0.73|   |      |
|         |       |acc_norm_stderr,none|        0|   |      |
|         |       |alias               |hellaswag|   |      |

Average: 73.24%

### MMLU

Average: Error: File does not exist%

### TruthfulQA
|     Task     |Version|        Metric         |      Value      |   |Stderr|
|--------------|-------|-----------------------|-----------------|---|------|
|truthfulqa    |N/A    |bleu_max,none          |            16.17|   |      |
|              |       |bleu_max_stderr,none   |             0.38|   |      |
|              |       |bleu_acc,none          |             0.36|   |      |
|              |       |bleu_acc_stderr,none   |                0|   |      |
|              |       |bleu_diff,none         |            -2.78|   |      |
|              |       |bleu_diff_stderr,none  |             0.26|   |      |
|              |       |rouge1_max,none        |            39.99|   |      |
|              |       |rouge1_max_stderr,none |             0.64|   |      |
|              |       |rouge1_acc,none        |             0.36|   |      |
|              |       |rouge1_acc_stderr,none |                0|   |      |
|              |       |rouge1_diff,none       |            -4.19|   |      |
|              |       |rouge1_diff_stderr,none|             0.45|   |      |
|              |       |rouge2_max,none        |            24.52|   |      |
|              |       |rouge2_max_stderr,none |             0.68|   |      |
|              |       |rouge2_acc,none        |             0.29|   |      |
|              |       |rouge2_acc_stderr,none |                0|   |      |
|              |       |rouge2_diff,none       |            -4.90|   |      |
|              |       |rouge2_diff_stderr,none|             0.55|   |      |
|              |       |rougeL_max,none        |            36.52|   |      |
|              |       |rougeL_max_stderr,none |             0.64|   |      |
|              |       |rougeL_acc,none        |             0.33|   |      |
|              |       |rougeL_acc_stderr,none |                0|   |      |
|              |       |rougeL_diff,none       |            -4.56|   |      |
|              |       |rougeL_diff_stderr,none|             0.45|   |      |
|              |       |acc,none               |             0.33|   |      |
|              |       |acc_stderr,none        |             0.05|   |      |
|              |       |alias                  |truthfulqa       |   |      |
|truthfulqa_gen|Yaml   |bleu_max,none          |            16.17|   |      |
|              |       |bleu_max_stderr,none   |             0.61|   |      |
|              |       |bleu_acc,none          |             0.36|   |      |
|              |       |bleu_acc_stderr,none   |             0.02|   |      |
|              |       |bleu_diff,none         |            -2.78|   |      |
|              |       |bleu_diff_stderr,none  |             0.51|   |      |
|              |       |rouge1_max,none        |            39.99|   |      |
|              |       |rouge1_max_stderr,none |             0.80|   |      |
|              |       |rouge1_acc,none        |             0.36|   |      |
|              |       |rouge1_acc_stderr,none |             0.02|   |      |
|              |       |rouge1_diff,none       |            -4.19|   |      |
|              |       |rouge1_diff_stderr,none|             0.67|   |      |
|              |       |rouge2_max,none        |            24.52|   |      |
|              |       |rouge2_max_stderr,none |             0.83|   |      |
|              |       |rouge2_acc,none        |             0.29|   |      |
|              |       |rouge2_acc_stderr,none |             0.02|   |      |
|              |       |rouge2_diff,none       |            -4.90|   |      |
|              |       |rouge2_diff_stderr,none|             0.74|   |      |
|              |       |rougeL_max,none        |            36.52|   |      |
|              |       |rougeL_max_stderr,none |             0.80|   |      |
|              |       |rougeL_acc,none        |             0.33|   |      |
|              |       |rougeL_acc_stderr,none |             0.02|   |      |
|              |       |rougeL_diff,none       |            -4.56|   |      |
|              |       |rougeL_diff_stderr,none|             0.67|   |      |
|              |       |alias                  | - truthfulqa_gen|   |      |
|truthfulqa_mc1|Yaml   |acc,none               |             0.28|   |      |
|              |       |acc_stderr,none        |             0.02|   |      |
|              |       |alias                  | - truthfulqa_mc1|   |      |
|truthfulqa_mc2|Yaml   |acc,none               |             0.43|   |      |
|              |       |acc_stderr,none        |             0.01|   |      |
|              |       |alias                  | - truthfulqa_mc2|   |      |

Average: 43.04%

### Winogrande
|   Task   |Version|    Metric     |  Value   |   |Stderr|
|----------|-------|---------------|----------|---|------|
|winogrande|Yaml   |acc,none       |      0.72|   |      |
|          |       |acc_stderr,none|      0.01|   |      |
|          |       |alias          |winogrande|   |      |

Average: 72.06%

### GSM8K
|Task |Version|           Metric            |Value|   |Stderr|
|-----|-------|-----------------------------|-----|---|------|
|gsm8k|Yaml   |exact_match,get-answer       | 0.23|   |      |
|     |       |exact_match_stderr,get-answer| 0.01|   |      |
|     |       |alias                        |gsm8k|   |      |

Average: 22.52%

Average score: Not available due to errors

Elapsed time: 03:56:55