MiniCorpus
Collection
github.com/MK2112/minicorpus
•
19 items
•
Updated
Benchmark | Measure | 160M MiniPile | 160M Reproduction | Percentage Difference of Means | 95% Confidence Interval | Interpretation | |
---|---|---|---|---|---|---|---|
ARC-Challenge | acc | ↑ | 0.2125 ± 0.0120 | 0.1894 ± 0.0115 |
-10.8706 | (0.0095; -0.0577) | Difference not significant |
MMLU | acc | ↑ | 0.2699 ± 0.0037 | 0.2295 ± 0.0035 | -14.9685 | (-0.0304; -0.0504) | MiniPile better |
HellaSwag | acc | ↑ | 0.2560 ± 0.0044 | 0.2604 ± 0.0044 | 1.7188 | (0.0166; -0.0078) | Difference not significant |
WinoGrande | acc | ↑ | 0.4720 ± 0.0140 | 0.5122 ± 0.0140 | 8.5169 | (0.0790; 0.0014) | Reproduction better |
Lambada (OpenAI) | acc | ↑ | 0.0000 ± 0.0000 | 0.0000 ± 0.0000 | - | - | - |
Lambada (OpenAI) | perplexity | ↓ | 3033175.2693 ± 288926.5827 | 1854408.3999 ± 148101.5978 | -38.8625 | (-542407.4980; -1815126.2408) | Reproduction severely better |
Lambada (Std) | acc | ↑ | 0.0000 ± 0.0000 | 0.0000 ± 0.0000 | - | - | - |
Lambada (Std) | perplexity | ↓ | 27067951.3460 ± 2710040.191 | 11927123.2514 ± 1063672.928 | -55.9364 | (-9434663.1814; -20846993.0080) | Reproduction severely better |
BLiMP | acc | ↑ | 0.5194 ± 0.0018 | 0.5481 ± 0.0017 | 5.5256 | (0.0336; 0.0238) | Reproduction better |
Base model
EleutherAI/pythia-160m-deduped