Benchmark Measure 160M MiniPile 160M Reproduction Percentage Difference of Means 95% Confidence Interval Interpretation
ARC-Challenge acc 0.2125 ± 0.0120 0.1894 ±
0.0115
-10.8706 (0.0095; -0.0577) Difference not significant
MMLU acc 0.2699 ± 0.0037 0.2295 ± 0.0035 -14.9685 (-0.0304; -0.0504) MiniPile better
HellaSwag acc 0.2560 ± 0.0044 0.2604 ± 0.0044 1.7188 (0.0166; -0.0078) Difference not significant
WinoGrande acc 0.4720 ± 0.0140 0.5122 ± 0.0140 8.5169 (0.0790; 0.0014) Reproduction better
Lambada (OpenAI) acc 0.0000 ± 0.0000 0.0000 ± 0.0000 - - -
Lambada (OpenAI) perplexity 3033175.2693 ± 288926.5827 1854408.3999 ± 148101.5978 -38.8625 (-542407.4980; -1815126.2408) Reproduction severely better
Lambada (Std) acc 0.0000 ± 0.0000 0.0000 ± 0.0000 - - -
Lambada (Std) perplexity 27067951.3460 ± 2710040.191 11927123.2514 ± 1063672.928 -55.9364 (-9434663.1814; -20846993.0080) Reproduction severely better
BLiMP acc 0.5194 ± 0.0018 0.5481 ± 0.0017 5.5256 (0.0336; 0.0238) Reproduction better
Downloads last month
5
Safetensors
Model size
162M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Marcus2112/pythia-160m-minipile_reproduction

Finetuned
(123)
this model

Dataset used to train Marcus2112/pythia-160m-minipile_reproduction

Collection including Marcus2112/pythia-160m-minipile_reproduction