File size: 7,192 Bytes
0dcab31 c4c87ff f7b7e4b c4c87ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
datasets:
- xzuyn/chatdoctor-200k-stripped
- Technoculture/riddle_sense
- axiong/pmc_llama_instructions
- Open-Orca/SlimOrca-Dedup
language:
- en
tags:
- medical
---
[Technoculture/MD7b-alpha](https://huggingface.co/Technoculture/MD7b-alpha) adapter merged with its Base Model (Meditron 7B)
# Evaluations
## Open LLM Leaderboard
| Model | ARC |HellaSwag| MMLU |TruthfulQA|Winogrande|GSM8K|
|---------------------------------------------------|----:|--------:|--------------------------|---------:|---------:|----:|
|[MT7Bi](https://huggingface.co/Technoculture/MT7Bi)|50.94| 73.24|Error: File does not exist| 43.04| 72.06|22.52|
### ARC
| Task |Version| Metric | Value | |Stderr|
|-------------|-------|--------------------|-------------|---|------|
|arc_challenge|Yaml |acc,none | 0.48| | |
| | |acc_stderr,none | 0.01| | |
| | |acc_norm,none | 0.51| | |
| | |acc_norm_stderr,none| 0.01| | |
| | |alias |arc_challenge| | |
Average: 50.94%
### HellaSwag
| Task |Version| Metric | Value | |Stderr|
|---------|-------|--------------------|---------|---|------|
|hellaswag|Yaml |acc,none | 0.54| | |
| | |acc_stderr,none | 0| | |
| | |acc_norm,none | 0.73| | |
| | |acc_norm_stderr,none| 0| | |
| | |alias |hellaswag| | |
Average: 73.24%
### MMLU
Average: Error: File does not exist%
### TruthfulQA
| Task |Version| Metric | Value | |Stderr|
|--------------|-------|-----------------------|-----------------|---|------|
|truthfulqa |N/A |bleu_max,none | 16.17| | |
| | |bleu_max_stderr,none | 0.38| | |
| | |bleu_acc,none | 0.36| | |
| | |bleu_acc_stderr,none | 0| | |
| | |bleu_diff,none | -2.78| | |
| | |bleu_diff_stderr,none | 0.26| | |
| | |rouge1_max,none | 39.99| | |
| | |rouge1_max_stderr,none | 0.64| | |
| | |rouge1_acc,none | 0.36| | |
| | |rouge1_acc_stderr,none | 0| | |
| | |rouge1_diff,none | -4.19| | |
| | |rouge1_diff_stderr,none| 0.45| | |
| | |rouge2_max,none | 24.52| | |
| | |rouge2_max_stderr,none | 0.68| | |
| | |rouge2_acc,none | 0.29| | |
| | |rouge2_acc_stderr,none | 0| | |
| | |rouge2_diff,none | -4.90| | |
| | |rouge2_diff_stderr,none| 0.55| | |
| | |rougeL_max,none | 36.52| | |
| | |rougeL_max_stderr,none | 0.64| | |
| | |rougeL_acc,none | 0.33| | |
| | |rougeL_acc_stderr,none | 0| | |
| | |rougeL_diff,none | -4.56| | |
| | |rougeL_diff_stderr,none| 0.45| | |
| | |acc,none | 0.33| | |
| | |acc_stderr,none | 0.05| | |
| | |alias |truthfulqa | | |
|truthfulqa_gen|Yaml |bleu_max,none | 16.17| | |
| | |bleu_max_stderr,none | 0.61| | |
| | |bleu_acc,none | 0.36| | |
| | |bleu_acc_stderr,none | 0.02| | |
| | |bleu_diff,none | -2.78| | |
| | |bleu_diff_stderr,none | 0.51| | |
| | |rouge1_max,none | 39.99| | |
| | |rouge1_max_stderr,none | 0.80| | |
| | |rouge1_acc,none | 0.36| | |
| | |rouge1_acc_stderr,none | 0.02| | |
| | |rouge1_diff,none | -4.19| | |
| | |rouge1_diff_stderr,none| 0.67| | |
| | |rouge2_max,none | 24.52| | |
| | |rouge2_max_stderr,none | 0.83| | |
| | |rouge2_acc,none | 0.29| | |
| | |rouge2_acc_stderr,none | 0.02| | |
| | |rouge2_diff,none | -4.90| | |
| | |rouge2_diff_stderr,none| 0.74| | |
| | |rougeL_max,none | 36.52| | |
| | |rougeL_max_stderr,none | 0.80| | |
| | |rougeL_acc,none | 0.33| | |
| | |rougeL_acc_stderr,none | 0.02| | |
| | |rougeL_diff,none | -4.56| | |
| | |rougeL_diff_stderr,none| 0.67| | |
| | |alias | - truthfulqa_gen| | |
|truthfulqa_mc1|Yaml |acc,none | 0.28| | |
| | |acc_stderr,none | 0.02| | |
| | |alias | - truthfulqa_mc1| | |
|truthfulqa_mc2|Yaml |acc,none | 0.43| | |
| | |acc_stderr,none | 0.01| | |
| | |alias | - truthfulqa_mc2| | |
Average: 43.04%
### Winogrande
| Task |Version| Metric | Value | |Stderr|
|----------|-------|---------------|----------|---|------|
|winogrande|Yaml |acc,none | 0.72| | |
| | |acc_stderr,none| 0.01| | |
| | |alias |winogrande| | |
Average: 72.06%
### GSM8K
|Task |Version| Metric |Value| |Stderr|
|-----|-------|-----------------------------|-----|---|------|
|gsm8k|Yaml |exact_match,get-answer | 0.23| | |
| | |exact_match_stderr,get-answer| 0.01| | |
| | |alias |gsm8k| | |
Average: 22.52%
Average score: Not available due to errors
Elapsed time: 03:56:55 |