MT7Bi-sft / README.md
satyamt's picture
Update README.md
c4c87ff verified
|
raw
history blame
No virus
7.16 kB
metadata
datasets:
  - xzuyn/chatdoctor-200k-stripped
  - Technoculture/riddle_sense
  - axiong/pmc_llama_instructions
  - Open-Orca/SlimOrca-Dedup
language:
  - en
tags:
  - medical

Technoculture/MD7b-alpha adapter merged with its Base Model (Meditron 7B)


Model ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
MT7Bi 50.94 73.24 Error: File does not exist 43.04 72.06 22.52

ARC

Task Version Metric Value Stderr
arc_challenge Yaml acc,none 0.48
acc_stderr,none 0.01
acc_norm,none 0.51
acc_norm_stderr,none 0.01
alias arc_challenge

Average: 50.94%

HellaSwag

Task Version Metric Value Stderr
hellaswag Yaml acc,none 0.54
acc_stderr,none 0
acc_norm,none 0.73
acc_norm_stderr,none 0
alias hellaswag

Average: 73.24%

MMLU

Average: Error: File does not exist%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa N/A bleu_max,none 16.17
bleu_max_stderr,none 0.38
bleu_acc,none 0.36
bleu_acc_stderr,none 0
bleu_diff,none -2.78
bleu_diff_stderr,none 0.26
rouge1_max,none 39.99
rouge1_max_stderr,none 0.64
rouge1_acc,none 0.36
rouge1_acc_stderr,none 0
rouge1_diff,none -4.19
rouge1_diff_stderr,none 0.45
rouge2_max,none 24.52
rouge2_max_stderr,none 0.68
rouge2_acc,none 0.29
rouge2_acc_stderr,none 0
rouge2_diff,none -4.90
rouge2_diff_stderr,none 0.55
rougeL_max,none 36.52
rougeL_max_stderr,none 0.64
rougeL_acc,none 0.33
rougeL_acc_stderr,none 0
rougeL_diff,none -4.56
rougeL_diff_stderr,none 0.45
acc,none 0.33
acc_stderr,none 0.05
alias truthfulqa
truthfulqa_gen Yaml bleu_max,none 16.17
bleu_max_stderr,none 0.61
bleu_acc,none 0.36
bleu_acc_stderr,none 0.02
bleu_diff,none -2.78
bleu_diff_stderr,none 0.51
rouge1_max,none 39.99
rouge1_max_stderr,none 0.80
rouge1_acc,none 0.36
rouge1_acc_stderr,none 0.02
rouge1_diff,none -4.19
rouge1_diff_stderr,none 0.67
rouge2_max,none 24.52
rouge2_max_stderr,none 0.83
rouge2_acc,none 0.29
rouge2_acc_stderr,none 0.02
rouge2_diff,none -4.90
rouge2_diff_stderr,none 0.74
rougeL_max,none 36.52
rougeL_max_stderr,none 0.80
rougeL_acc,none 0.33
rougeL_acc_stderr,none 0.02
rougeL_diff,none -4.56
rougeL_diff_stderr,none 0.67
alias - truthfulqa_gen
truthfulqa_mc1 Yaml acc,none 0.28
acc_stderr,none 0.02
alias - truthfulqa_mc1
truthfulqa_mc2 Yaml acc,none 0.43
acc_stderr,none 0.01
alias - truthfulqa_mc2

Average: 43.04%

Winogrande

Task Version Metric Value Stderr
winogrande Yaml acc,none 0.72
acc_stderr,none 0.01
alias winogrande

Average: 72.06%

GSM8K

Task Version Metric Value Stderr
gsm8k Yaml exact_match,get-answer 0.23
exact_match_stderr,get-answer 0.01
alias gsm8k

Average: 22.52%

Average score: Not available due to errors

Elapsed time: 03:56:55