DevQuasar
/

analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

csabakecskemeti commited on 21 days ago

Commit

e058427

•

1 Parent(s): 3900a10

Update README.md

results presentation changes

Files changed (1) hide show

README.md +21 -18

README.md CHANGED Viewed

@@ -31,26 +31,29 @@ model-index:
 ### eval
 | Test | Base Model | Fine-Tuned Model | Performance Gain |
 |---|---|---|---|
-| leaderboard_bbh_logical_deduction_seven_objects | 0.252 | 0.436 | 0.184 |
-| leaderboard_bbh_logical_deduction_five_objects | 0.356 | 0.456 | 0.10000000000000003 |
-| leaderboard_musr_team_allocation | 0.22 | 0.32 | 0.1 |
-| leaderboard_bbh_disambiguation_qa | 0.304 | 0.376 | 0.07200000000000001 |
-| leaderboard_gpqa_diamond | 0.2222222222222222 | 0.2727272727272727 | 0.0505050505050505 |
-| leaderboard_bbh_movie_recommendation | 0.596 | 0.636 | 0.040000000000000036 |
-| leaderboard_bbh_formal_fallacies | 0.508 | 0.54 | 0.03200000000000003 |
-| leaderboard_bbh_tracking_shuffled_objects_three_objects | 0.316 | 0.344 | 0.02799999999999997 |
-| leaderboard_bbh_causal_judgement | 0.5454545454545454 | 0.5668449197860963 | 0.021390374331550888 |
-| leaderboard_bbh_web_of_lies | 0.496 | 0.516 | 0.020000000000000018 |
-| leaderboard_math_geometry_hard | 0.045454545454545456 | 0.06060606060606061 | 0.015151515151515152 |
-| leaderboard_math_num_theory_hard | 0.05194805194805195 | 0.06493506493506493 | 0.012987012987012977 |
-| leaderboard_musr_murder_mysteries | 0.528 | 0.54 | 0.01200000000000001 |
-| leaderboard_gpqa_extended | 0.27106227106227104 | 0.2802197802197802 | 0.00915750915750918 |
-| leaderboard_bbh_sports_understanding | 0.596 | 0.604 | 0.008000000000000007 |
-| leaderboard_math_intermediate_algebra_hard | 0.010714285714285714 | 0.014285714285714285 | 0.003571428571428571 |
-| leaderboard_bbh_navigate | 0.62 | 0.62 | 0.0 |
 ### Framework versions

 ### eval
+The fine tuned model (DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit)
+has gained performace over the base model (unsloth/Llama-3.2-3B-Instruct-bnb-4bit)
+in the following tasks.
 | Test | Base Model | Fine-Tuned Model | Performance Gain |
 |---|---|---|---|
+| leaderboard_bbh_logical_deduction_seven_objects | 0.2520 | 0.4360 | 0.1840 |
+| leaderboard_bbh_logical_deduction_five_objects | 0.3560 | 0.4560 | 0.1000 |
+| leaderboard_musr_team_allocation | 0.2200 | 0.3200 | 0.1000 |
+| leaderboard_bbh_disambiguation_qa | 0.3040 | 0.3760 | 0.0720 |
+| leaderboard_gpqa_diamond | 0.2222 | 0.2727 | 0.0505 |
+| leaderboard_bbh_movie_recommendation | 0.5960 | 0.6360 | 0.0400 |
+| leaderboard_bbh_formal_fallacies | 0.5080 | 0.5400 | 0.0320 |
+| leaderboard_bbh_tracking_shuffled_objects_three_objects | 0.3160 | 0.3440 | 0.0280 |
+| leaderboard_bbh_causal_judgement | 0.5455 | 0.5668 | 0.0214 |
+| leaderboard_bbh_web_of_lies | 0.4960 | 0.5160 | 0.0200 |
+| leaderboard_math_geometry_hard | 0.0455 | 0.0606 | 0.0152 |
+| leaderboard_math_num_theory_hard | 0.0519 | 0.0649 | 0.0130 |
+| leaderboard_musr_murder_mysteries | 0.5280 | 0.5400 | 0.0120 |
+| leaderboard_gpqa_extended | 0.2711 | 0.2802 | 0.0092 |
+| leaderboard_bbh_sports_understanding | 0.5960 | 0.6040 | 0.0080 |
+| leaderboard_math_intermediate_algebra_hard | 0.0107 | 0.0143 | 0.0036 |
 ### Framework versions