Edit model card

This is a distillation experiment with SmolLM2-1.7B as teacher and SmolLM2-360M as student model.

It slightly improves upon the performance of the basemodel on the following tasks (wip). I guess i can do much better than this - will try again.

Tasks HuggingFaceTB/SmolLM2-360M Value aloobun/d-SmolLM2-360M Value
- leaderboard_bbh_causal_judgement 0.4545 0.4652
- leaderboard_bbh_geometric_shapes 0.1680 0.2040
- leaderboard_bbh_movie_recommendation 0.2120 0.2440
- leaderboard_bbh_penguins_in_a_table 0.2055 0.2123
- leaderboard_bbh_reasoning_about_colored_objects 0.1160 0.1320
- leaderboard_bbh_ruin_names 0.2360 0.2480
- leaderboard_bbh_salient_translation_error_detection 0.1480 0.2120
- leaderboard_bbh_snarks 0.5169 0.5281
- leaderboard_bbh_temporal_sequences 0.2720 0.2800
- leaderboard_musr_murder_mysteries 0.5040 0.5160

Eval Results aloobun/d-SmolLM2-360M (WIP)

Todo:

ifeval (0-shot, generative)

Math-lvl-5 (4-shots, generative, minerva version)

GPQA

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm ↑ 0.2071 ± 0.0289
- leaderboard_gpqa_extended 1 none 0 acc_norm ↑ 0.2308 ± 0.0180
- leaderboard_gpqa_main 1 none 0 acc_norm ↑ 0.2679 ± 0.0209

MUSR

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm ↑ 0.5160 ± 0.0317
- leaderboard_musr_object_placements 1 none 0 acc_norm ↑ 0.2383 ± 0.0267
- leaderboard_musr_team_allocation 1 none 0 acc_norm ↑ 0.4400 ± 0.0315

BBH

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm ↑ 0.5480 ± 0.0315
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm ↑ 0.4652 ± 0.0366
- leaderboard_bbh_date_understanding 1 none 3 acc_norm ↑ 0.1560 ± 0.0230
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm ↑ 0.3120 ± 0.0294
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm ↑ 0.5240 ± 0.0316
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm ↑ 0.2040 ± 0.0255
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm ↑ 0.5000 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm ↑ 0.2240 ± 0.0264
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm ↑ 0.1440 ± 0.0222
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm ↑ 0.3320 ± 0.0298
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm ↑ 0.2440 ± 0.0272
- leaderboard_bbh_navigate 1 none 3 acc_norm ↑ 0.5800 ± 0.0313
- leaderboard_bbh_object_counting 1 none 3 acc_norm ↑ 0.2080 ± 0.0257
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm ↑ 0.2123 ± 0.0340
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm ↑ 0.1320 ± 0.0215
- leaderboard_bbh_ruin_names 1 none 3 acc_norm ↑ 0.2480 ± 0.0274
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm ↑ 0.2120 ± 0.0259
- leaderboard_bbh_snarks 1 none 3 acc_norm ↑ 0.5281 ± 0.0375
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm ↑ 0.4600 ± 0.0316
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm ↑ 0.2800 ± 0.0285
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm ↑ 0.1720 ± 0.0239
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm ↑ 0.1440 ± 0.0222
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm ↑ 0.3000 ± 0.0290
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm ↑ 0.5480 ± 0.0315

MMLU_PRO

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_mmlu_pro 0.1 none 5 acc ↑ 0.1173 ± 0.0029

IFEVAL

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_ifeval 3 none 0 inst_level_loose_acc ↑ 0.2866 ± N/A
none 0 inst_level_strict_acc ↑ 0.2770 ± N/A
none 0 prompt_level_loose_acc ↑ 0.1497 ± 0.0154
none 0 prompt_level_strict_acc ↑ 0.1423 ± 0.0150
Downloads last month
9
Safetensors
Model size
362M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for aloobun/d-SmolLM2-360M

Quantizations
1 model