Edit model card
Model AGIEval GPT4All TruthfulQA Bigbench Average
Lelantos-DPO-7B 45.47 75 67.05 46.64 58.54
Lelantos-7B 46.01 75 64.93 46.21 58.04

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.20 ± 2.73
acc_norm 24.02 ± 2.69
agieval_logiqa_en 0 acc 40.71 ± 1.93
acc_norm 40.25 ± 1.92
agieval_lsat_ar 0 acc 24.35 ± 2.84
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 55.69 ± 2.20
acc_norm 55.49 ± 2.20
agieval_lsat_rc 0 acc 65.06 ± 2.91
acc_norm 65.43 ± 2.91
agieval_sat_en 0 acc 76.70 ± 2.95
acc_norm 76.70 ± 2.95
agieval_sat_en_without_passage 0 acc 47.09 ± 3.49
acc_norm 45.63 ± 3.48
agieval_sat_math 0 acc 36.36 ± 3.25
acc_norm 33.18 ± 3.18

Average: 45.47%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 62.12 ± 1.42
acc_norm 63.23 ± 1.41
arc_easy 0 acc 85.40 ± 0.72
acc_norm 81.02 ± 0.80
boolq 1 acc 87.25 ± 0.58
hellaswag 0 acc 67.97 ± 0.47
acc_norm 85.48 ± 0.35
openbookqa 0 acc 36.80 ± 2.16
acc_norm 47.20 ± 2.23
piqa 0 acc 81.88 ± 0.90
acc_norm 83.57 ± 0.86
winogrande 0 acc 77.27 ± 1.18

Average: 75.0%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 49.94 ± 1.75
mc2 67.05 ± 1.53

Average: 67.05%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 58.95 ± 3.58
bigbench_date_understanding 0 multiple_choice_grade 64.23 ± 2.50
bigbench_disambiguation_qa 0 multiple_choice_grade 36.43 ± 3.00
bigbench_geometric_shapes 0 multiple_choice_grade 23.68 ± 2.25
exact_str_match 3.90 ± 1.02
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 33.40 ± 2.11
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 24.43 ± 1.63
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 54.33 ± 2.88
bigbench_movie_recommendation 0 multiple_choice_grade 52.20 ± 2.24
bigbench_navigate 0 multiple_choice_grade 52.70 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 69.65 ± 1.03
bigbench_ruin_names 0 multiple_choice_grade 50.22 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 40.98 ± 1.56
bigbench_snarks 0 multiple_choice_grade 72.38 ± 3.33
bigbench_sports_understanding 0 multiple_choice_grade 73.23 ± 1.41
bigbench_temporal_sequences 0 multiple_choice_grade 39.90 ± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 20.88 ± 1.15
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.60 ± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 54.33 ± 2.88

Average: 46.64%

Average score: 58.54%

Downloads last month
3,241
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Spaces using SanjiWatsuki/Lelantos-DPO-7B 8