xellDart13
/

NebuIA-10.7B-DPO-v3

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
NebuIA-10.7B-DPO-v3	48.8	74.87	74.66	45.81	61.03

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	27.95	±	2.82
		acc_norm	28.74	±	2.85
agieval_logiqa_en	0	acc	42.86	±	1.94
		acc_norm	43.47	±	1.94
agieval_lsat_ar	0	acc	24.78	±	2.85
		acc_norm	25.65	±	2.89
agieval_lsat_lr	0	acc	54.31	±	2.21
		acc_norm	54.51	±	2.21
agieval_lsat_rc	0	acc	69.52	±	2.81
		acc_norm	69.89	±	2.80
agieval_sat_en	0	acc	79.61	±	2.81
		acc_norm	79.13	±	2.84
agieval_sat_en_without_passage	0	acc	49.51	±	3.49
		acc_norm	49.03	±	3.49
agieval_sat_math	0	acc	44.55	±	3.36
		acc_norm	40.00	±	3.31

Average: 48.8%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	60.49	±	1.43
		acc_norm	64.16	±	1.40
arc_easy	0	acc	82.70	±	0.78
		acc_norm	80.60	±	0.81
boolq	1	acc	88.20	±	0.56
hellaswag	0	acc	68.90	±	0.46
		acc_norm	86.58	±	0.34
openbookqa	0	acc	36.80	±	2.16
		acc_norm	48.40	±	2.24
piqa	0	acc	80.20	±	0.93
		acc_norm	80.41	±	0.93
winogrande	0	acc	75.77	±	1.20

Average: 74.87%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	60.10	±	1.71
		mc2	74.66	±	1.46

Average: 74.66%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	54.21	±	3.62
bigbench_date_understanding	0	multiple_choice_grade	62.60	±	2.52
bigbench_disambiguation_qa	0	multiple_choice_grade	37.60	±	3.02
bigbench_geometric_shapes	0	multiple_choice_grade	29.25	±	2.40
		exact_str_match	0.00	±	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	27.60	±	2.00
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	20.71	±	1.53
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	46.67	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	46.40	±	2.23
bigbench_navigate	0	multiple_choice_grade	63.90	±	1.52
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	59.25	±	1.10
bigbench_ruin_names	0	multiple_choice_grade	42.63	±	2.34
bigbench_salient_translation_error_detection	0	multiple_choice_grade	40.28	±	1.55
bigbench_snarks	0	multiple_choice_grade	67.40	±	3.49
bigbench_sports_understanding	0	multiple_choice_grade	72.92	±	1.42
bigbench_temporal_sequences	0	multiple_choice_grade	64.40	±	1.51
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	24.40	±	1.22
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.66	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	46.67	±	2.89

Average: 45.81%

Average score: 61.03%

Elapsed time: 03:03:27

Downloads last month: 40

Safetensors

Model size

10.7B params

Tensor type

FP16

·