--- library_name: transformers license: apache-2.0 language: - en tags: - smollm2 - smollm2-360m - distillation --- This is a distillation experiment with SmolLM2-1.7B as teacher and SmolLM2-360M as student model. It slightly improves upon the performance of the basemodel on the following tasks (wip): | Tasks |**HuggingFaceTB/SmolLM2-360M** Value|**aloobun/d-SmolLM2-360M** Value| |----------------------------------------------------------|-------------:|-------------:| | - leaderboard_bbh_causal_judgement | 0.4545 | 0.4652 | | - leaderboard_bbh_geometric_shapes | 0.1680 | 0.2040 | | - leaderboard_bbh_movie_recommendation | 0.2120 | 0.2440 | | - leaderboard_bbh_penguins_in_a_table | 0.2055 | 0.2123 | | - leaderboard_bbh_reasoning_about_colored_objects | 0.1160 | 0.1320 | | - leaderboard_bbh_ruin_names | 0.2360 | 0.2480 | | - leaderboard_bbh_salient_translation_error_detection | 0.1480 | 0.2120 | | - leaderboard_bbh_snarks | 0.5169 | 0.5281 | | - leaderboard_bbh_temporal_sequences | 0.2720 | 0.2800 | | - leaderboard_musr_murder_mysteries | 0.5040 | 0.5160 | Well, it didn’t work as well as I hoped, will try again. # Eval Results aloobun/d-SmolLM2-360M (WIP) ## GPQA | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |----------------------------|-------|------|-----:|--------|---|-----:|---|-----:| |leaderboard_gpqa | N/A| | | | | | | | | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm|↑ |0.2071|± |0.0289| | - leaderboard_gpqa_extended| 1|none | 0|acc_norm|↑ |0.2308|± |0.0180| | - leaderboard_gpqa_main | 1|none | 0|acc_norm|↑ |0.2679|± |0.0209| ## MUSR | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |-------------------------------------|-------|------|-----:|--------|---|-----:|---|-----:| |leaderboard_musr | N/A| | | | | | | | | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm|↑ |0.5160|± |0.0317| | - leaderboard_musr_object_placements| 1|none | 0|acc_norm|↑ |0.2383|± |0.0267| | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm|↑ |0.4400|± |0.0315| ## BBH | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |----------------------------------------------------------|-------|------|-----:|--------|---|-----:|---|-----:| |leaderboard_bbh | N/A| | | | | | | | | - leaderboard_bbh_boolean_expressions | 1|none | 3|acc_norm|↑ |0.5480|± |0.0315| | - leaderboard_bbh_causal_judgement | 1|none | 3|acc_norm|↑ |0.4652|± |0.0366| | - leaderboard_bbh_date_understanding | 1|none | 3|acc_norm|↑ |0.1560|± |0.0230| | - leaderboard_bbh_disambiguation_qa | 1|none | 3|acc_norm|↑ |0.3120|± |0.0294| | - leaderboard_bbh_formal_fallacies | 1|none | 3|acc_norm|↑ |0.5240|± |0.0316| | - leaderboard_bbh_geometric_shapes | 1|none | 3|acc_norm|↑ |0.2040|± |0.0255| | - leaderboard_bbh_hyperbaton | 1|none | 3|acc_norm|↑ |0.5000|± |0.0317| | - leaderboard_bbh_logical_deduction_five_objects | 1|none | 3|acc_norm|↑ |0.2240|± |0.0264| | - leaderboard_bbh_logical_deduction_seven_objects | 1|none | 3|acc_norm|↑ |0.1440|± |0.0222| | - leaderboard_bbh_logical_deduction_three_objects | 1|none | 3|acc_norm|↑ |0.3320|± |0.0298| | - leaderboard_bbh_movie_recommendation | 1|none | 3|acc_norm|↑ |0.2440|± |0.0272| | - leaderboard_bbh_navigate | 1|none | 3|acc_norm|↑ |0.5800|± |0.0313| | - leaderboard_bbh_object_counting | 1|none | 3|acc_norm|↑ |0.2080|± |0.0257| | - leaderboard_bbh_penguins_in_a_table | 1|none | 3|acc_norm|↑ |0.2123|± |0.0340| | - leaderboard_bbh_reasoning_about_colored_objects | 1|none | 3|acc_norm|↑ |0.1320|± |0.0215| | - leaderboard_bbh_ruin_names | 1|none | 3|acc_norm|↑ |0.2480|± |0.0274| | - leaderboard_bbh_salient_translation_error_detection | 1|none | 3|acc_norm|↑ |0.2120|± |0.0259| | - leaderboard_bbh_snarks | 1|none | 3|acc_norm|↑ |0.5281|± |0.0375| | - leaderboard_bbh_sports_understanding | 1|none | 3|acc_norm|↑ |0.4600|± |0.0316| | - leaderboard_bbh_temporal_sequences | 1|none | 3|acc_norm|↑ |0.2800|± |0.0285| | - leaderboard_bbh_tracking_shuffled_objects_five_objects | 1|none | 3|acc_norm|↑ |0.1720|± |0.0239| | - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 1|none | 3|acc_norm|↑ |0.1440|± |0.0222| | - leaderboard_bbh_tracking_shuffled_objects_three_objects| 1|none | 3|acc_norm|↑ |0.3000|± |0.0290| | - leaderboard_bbh_web_of_lies | 1|none | 3|acc_norm|↑ |0.5480|± |0.0315| ## MMLU_PRO | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |--------------------|------:|------|-----:|------|---|-----:|---|-----:| |leaderboard_mmlu_pro| 0.1|none | 5|acc |↑ |0.1173|± |0.0029| ## IFEVAL | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |------------------|------:|------|-----:|-----------------------|---|-----:|---|------| |leaderboard_ifeval| 3|none | 0|inst_level_loose_acc |↑ |0.2866|± | N/A| | | |none | 0|inst_level_strict_acc |↑ |0.2770|± | N/A| | | |none | 0|prompt_level_loose_acc |↑ |0.1497|± |0.0154| | | |none | 0|prompt_level_strict_acc|↑ |0.1423|± |0.0150| ## MATH HARD | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------------------------------------------|-------|------|-----:|-----------|---|-----:|---|-----:| |leaderboard_math_hard | N/A| | | | | | | | | - leaderboard_math_algebra_hard | 2|none | 4|exact_match|↑ |0.0033|± |0.0033| | - leaderboard_math_counting_and_prob_hard | 2|none | 4|exact_match|↑ |0.0081|± |0.0081| | - leaderboard_math_geometry_hard | 2|none | 4|exact_match|↑ |0.0000|± |0.0000| | - leaderboard_math_intermediate_algebra_hard| 2|none | 4|exact_match|↑ |0.0000|± |0.0000| | - leaderboard_math_num_theory_hard | 2|none | 4|exact_match|↑ |0.0065|± |0.0065| | - leaderboard_math_prealgebra_hard | 2|none | 4|exact_match|↑ |0.0104|± |0.0073| | - leaderboard_math_precalculus_hard | 2|none | 4|exact_match|↑ |0.0000|± |0.0000|