csabakecskemeti commited on
Commit
e058427
1 Parent(s): 3900a10

Update README.md

Browse files

results presentation changes

Files changed (1) hide show
  1. README.md +21 -18
README.md CHANGED
@@ -31,26 +31,29 @@ model-index:
31
 
32
  ### eval
33
 
 
 
 
 
34
  | Test | Base Model | Fine-Tuned Model | Performance Gain |
35
  |---|---|---|---|
36
- | leaderboard_bbh_logical_deduction_seven_objects | 0.252 | 0.436 | 0.184 |
37
- | leaderboard_bbh_logical_deduction_five_objects | 0.356 | 0.456 | 0.10000000000000003 |
38
- | leaderboard_musr_team_allocation | 0.22 | 0.32 | 0.1 |
39
- | leaderboard_bbh_disambiguation_qa | 0.304 | 0.376 | 0.07200000000000001 |
40
- | leaderboard_gpqa_diamond | 0.2222222222222222 | 0.2727272727272727 | 0.0505050505050505 |
41
- | leaderboard_bbh_movie_recommendation | 0.596 | 0.636 | 0.040000000000000036 |
42
- | leaderboard_bbh_formal_fallacies | 0.508 | 0.54 | 0.03200000000000003 |
43
- | leaderboard_bbh_tracking_shuffled_objects_three_objects | 0.316 | 0.344 | 0.02799999999999997 |
44
- | leaderboard_bbh_causal_judgement | 0.5454545454545454 | 0.5668449197860963 | 0.021390374331550888 |
45
- | leaderboard_bbh_web_of_lies | 0.496 | 0.516 | 0.020000000000000018 |
46
- | leaderboard_math_geometry_hard | 0.045454545454545456 | 0.06060606060606061 | 0.015151515151515152 |
47
- | leaderboard_math_num_theory_hard | 0.05194805194805195 | 0.06493506493506493 | 0.012987012987012977 |
48
- | leaderboard_musr_murder_mysteries | 0.528 | 0.54 | 0.01200000000000001 |
49
- | leaderboard_gpqa_extended | 0.27106227106227104 | 0.2802197802197802 | 0.00915750915750918 |
50
- | leaderboard_bbh_sports_understanding | 0.596 | 0.604 | 0.008000000000000007 |
51
- | leaderboard_math_intermediate_algebra_hard | 0.010714285714285714 | 0.014285714285714285 | 0.003571428571428571 |
52
- | leaderboard_bbh_navigate | 0.62 | 0.62 | 0.0 |
53
-
54
 
55
  ### Framework versions
56
 
 
31
 
32
  ### eval
33
 
34
+ The fine tuned model (DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit)
35
+ has gained performace over the base model (unsloth/Llama-3.2-3B-Instruct-bnb-4bit)
36
+ in the following tasks.
37
+
38
  | Test | Base Model | Fine-Tuned Model | Performance Gain |
39
  |---|---|---|---|
40
+ | leaderboard_bbh_logical_deduction_seven_objects | 0.2520 | 0.4360 | 0.1840 |
41
+ | leaderboard_bbh_logical_deduction_five_objects | 0.3560 | 0.4560 | 0.1000 |
42
+ | leaderboard_musr_team_allocation | 0.2200 | 0.3200 | 0.1000 |
43
+ | leaderboard_bbh_disambiguation_qa | 0.3040 | 0.3760 | 0.0720 |
44
+ | leaderboard_gpqa_diamond | 0.2222 | 0.2727 | 0.0505 |
45
+ | leaderboard_bbh_movie_recommendation | 0.5960 | 0.6360 | 0.0400 |
46
+ | leaderboard_bbh_formal_fallacies | 0.5080 | 0.5400 | 0.0320 |
47
+ | leaderboard_bbh_tracking_shuffled_objects_three_objects | 0.3160 | 0.3440 | 0.0280 |
48
+ | leaderboard_bbh_causal_judgement | 0.5455 | 0.5668 | 0.0214 |
49
+ | leaderboard_bbh_web_of_lies | 0.4960 | 0.5160 | 0.0200 |
50
+ | leaderboard_math_geometry_hard | 0.0455 | 0.0606 | 0.0152 |
51
+ | leaderboard_math_num_theory_hard | 0.0519 | 0.0649 | 0.0130 |
52
+ | leaderboard_musr_murder_mysteries | 0.5280 | 0.5400 | 0.0120 |
53
+ | leaderboard_gpqa_extended | 0.2711 | 0.2802 | 0.0092 |
54
+ | leaderboard_bbh_sports_understanding | 0.5960 | 0.6040 | 0.0080 |
55
+ | leaderboard_math_intermediate_algebra_hard | 0.0107 | 0.0143 | 0.0036 |
56
+
 
57
 
58
  ### Framework versions
59