MT7Bi-sft / README.md
satyamt's picture
Update README.md
4a39413 verified
|
raw
history blame
7.06 kB
metadata
datasets:
  - xzuyn/chatdoctor-200k-stripped
  - Technoculture/riddle_sense
  - axiong/pmc_llama_instructions
  - Open-Orca/SlimOrca-Dedup
language:
  - en
tags:
  - medical

Technoculture/MD7b-alpha adapter merged with its Base Model (Meditron 7B)

Evaluations

Open LLM Leaderboard

Model ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
MT7Bi 50.94 73.24 Error: File does not exist 43.04 72.06 22.52

ARC: 50.94%

Task Version Metric Value Stderr
arc_challenge Yaml acc,none 0.48
acc_stderr,none 0.01
acc_norm,none 0.51
acc_norm_stderr,none 0.01
alias arc_challenge

HellaSwag: 73.24%

Task Version Metric Value Stderr
hellaswag Yaml acc,none 0.54
acc_stderr,none 0
acc_norm,none 0.73
acc_norm_stderr,none 0
alias hellaswag

TruthfulQA: 43.04%

Task Version Metric Value Stderr
truthfulqa N/A bleu_max,none 16.17
bleu_max_stderr,none 0.38
bleu_acc,none 0.36
bleu_acc_stderr,none 0
bleu_diff,none -2.78
bleu_diff_stderr,none 0.26
rouge1_max,none 39.99
rouge1_max_stderr,none 0.64
rouge1_acc,none 0.36
rouge1_acc_stderr,none 0
rouge1_diff,none -4.19
rouge1_diff_stderr,none 0.45
rouge2_max,none 24.52
rouge2_max_stderr,none 0.68
rouge2_acc,none 0.29
rouge2_acc_stderr,none 0
rouge2_diff,none -4.90
rouge2_diff_stderr,none 0.55
rougeL_max,none 36.52
rougeL_max_stderr,none 0.64
rougeL_acc,none 0.33
rougeL_acc_stderr,none 0
rougeL_diff,none -4.56
rougeL_diff_stderr,none 0.45
acc,none 0.33
acc_stderr,none 0.05
alias truthfulqa
truthfulqa_gen Yaml bleu_max,none 16.17
bleu_max_stderr,none 0.61
bleu_acc,none 0.36
bleu_acc_stderr,none 0.02
bleu_diff,none -2.78
bleu_diff_stderr,none 0.51
rouge1_max,none 39.99
rouge1_max_stderr,none 0.80
rouge1_acc,none 0.36
rouge1_acc_stderr,none 0.02
rouge1_diff,none -4.19
rouge1_diff_stderr,none 0.67
rouge2_max,none 24.52
rouge2_max_stderr,none 0.83
rouge2_acc,none 0.29
rouge2_acc_stderr,none 0.02
rouge2_diff,none -4.90
rouge2_diff_stderr,none 0.74
rougeL_max,none 36.52
rougeL_max_stderr,none 0.80
rougeL_acc,none 0.33
rougeL_acc_stderr,none 0.02
rougeL_diff,none -4.56
rougeL_diff_stderr,none 0.67
alias - truthfulqa_gen
truthfulqa_mc1 Yaml acc,none 0.28
acc_stderr,none 0.02
alias - truthfulqa_mc1
truthfulqa_mc2 Yaml acc,none 0.43
acc_stderr,none 0.01
alias - truthfulqa_mc2

Winogrande: 72.06%

Task Version Metric Value Stderr
winogrande Yaml acc,none 0.72
acc_stderr,none 0.01
alias winogrande

GSM8K: 22.52%

Task Version Metric Value Stderr
gsm8k Yaml exact_match,get-answer 0.23
exact_match_stderr,get-answer 0.01
alias gsm8k

Elapsed time: 03:56:55