Technoculture
/

MT7Bi-sft

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

Technoculture/MT7Bi-alpha adapter merged with its Base Model (Meditron 7B)

Evaluations

Open LLM Leaderboard

Model	ARC	HellaSwag	TruthfulQA	Winogrande	GSM8K
MT7Bi-sft (epoch 4)	54.1	75.11	43.08	72.14	15.54
MT7Bi-sft (epoch 1)	50.94	73.24	43.04	72.06	22.52

Model Evaluation Benchmark


Category	MT7Bi	meditron-70b	llama-2-70b	med42-70b*	meditron-7b	llama-2-7b	PMC-llama-7b
Health		81.8	69.1	83.6	27.3	16.4	3.6
Nutrition		77.9	68.8	62.5	31.1	12.5	6.3
Psychology		47.4	36.8	52.6	21.1	10.5	0.0
Science		77.8	44.4	33.3	33.3	11.1	0.0
Avg		71.2	54.8	58.0	28.3	12.6	2.5


Dataset	MT7Bi	meditron-70b	llama-2-70b	med42-70b*	clinical-camel-70b*
MMLU-Medical	46.9	77.6	77.9	74.5	65.7
PubMedQA	65.2	81.6	80.0	61.2	67.0
MedMCQA	42.7	66.0	62.6	59.2	46.7
MedQA		64.4	61.5	59.1	50.8
MedQA-4-Option	44.3	70.2	63.8	63.9	56.8
Avg		72.0	69.2	63.6	57.4


Dataset	meditron-7b	llama-2-7b	pmc-llama-7b	Zephyr-7B-beta*	Mistral-7B-instruct*	MT7Bi
MMLU-Medical	54.2	53.7	56.4	63.3	60.0	46.9
PubMedQA	74.4	61.8	59.2	46.0	17.8	65.2
MedMCQA	59.2	54.4	57.6	43.0	40.2	42.7
MedQA	47.9	44.0	42.4	42.8	32.4
MedQA-4-Option	52.0	49.6	49.2	48.5	41.1	44.3
Avg	57.5	52.7	53.0	48.7	38.3

Model Name	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
Orca-2-7b	78.4	76.1	53.7	52.4	74.2	47.2
LLAMA-2-7b	43.2	77.1	44.4	38.7	69.5	16
MT7Bi-sft	54.1	75.11	-	43.08	72.14	15.54

ARC: 54.1%

Task	Version	Metric	Value		Stderr
arc_challenge	1	acc,none	0.51
		acc_stderr,none	0.01
		acc_norm,none	0.54
		acc_norm_stderr,none	0.01
		alias	arc_challenge

HellaSwag: 75.11%

Task	Version	Metric	Value		Stderr
hellaswag	1	acc,none	0.57
		acc_stderr,none	0
		acc_norm,none	0.75
		acc_norm_stderr,none	0
		alias	hellaswag

TruthfulQA: 43.08%

Task	Version	Metric	Value		Stderr
truthfulqa	N/A	bleu_max,none	18.31
		bleu_max_stderr,none	0.46
		bleu_acc,none	0.39
		bleu_acc_stderr,none	0
		bleu_diff,none	-1.63
		bleu_diff_stderr,none	0.39
		rouge1_max,none	41.99
		rouge1_max_stderr,none	0.71
		rouge1_acc,none	0.39
		rouge1_acc_stderr,none	0
		rouge1_diff,none	-2.88
		rouge1_diff_stderr,none	0.66
		rouge2_max,none	27.42
		rouge2_max_stderr,none	0.80
		rouge2_acc,none	0.32
		rouge2_acc_stderr,none	0
		rouge2_diff,none	-3.11
		rouge2_diff_stderr,none	0.78
		rougeL_max,none	38.81
		rougeL_max_stderr,none	0.71
		rougeL_acc,none	0.38
		rougeL_acc_stderr,none	0
		rougeL_diff,none	-3.01
		rougeL_diff_stderr,none	0.66
		acc,none	0.33
		acc_stderr,none	0.05
		alias	truthfulqa
truthfulqa_gen	3	bleu_max,none	18.31
		bleu_max_stderr,none	0.68
		bleu_acc,none	0.39
		bleu_acc_stderr,none	0.02
		bleu_diff,none	-1.63
		bleu_diff_stderr,none	0.62
		rouge1_max,none	41.99
		rouge1_max_stderr,none	0.84
		rouge1_acc,none	0.39
		rouge1_acc_stderr,none	0.02
		rouge1_diff,none	-2.88
		rouge1_diff_stderr,none	0.81
		rouge2_max,none	27.42
		rouge2_max_stderr,none	0.89
		rouge2_acc,none	0.32
		rouge2_acc_stderr,none	0.02
		rouge2_diff,none	-3.11
		rouge2_diff_stderr,none	0.88
		rougeL_max,none	38.81
		rougeL_max_stderr,none	0.84
		rougeL_acc,none	0.38
		rougeL_acc_stderr,none	0.02
		rougeL_diff,none	-3.01
		rougeL_diff_stderr,none	0.82
		alias	- truthfulqa_gen
truthfulqa_mc1	2	acc,none	0.28
		acc_stderr,none	0.02
		alias	- truthfulqa_mc1
truthfulqa_mc2	2	acc,none	0.43
		acc_stderr,none	0.01
		alias	- truthfulqa_mc2

Winogrande: 72.14%

Task	Version	Metric	Value		Stderr
winogrande	1	acc,none	0.72
		acc_stderr,none	0.01
		alias	winogrande

GSM8K: 15.54%

Task	Version	Metric	Value		Stderr
gsm8k	2	exact_match,get-answer	0.16
		exact_match_stderr,get-answer	0.01
		alias	gsm8k

Elapsed time: 04:06:36

Downloads last month: 6,452

Datasets used to train Technoculture/MT7Bi-sft