Wandb run: https://wandb.ai/eleutherai/pythia-rlhf/runs/7f9c9lrm
Eval Results
Tasks | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
arc_challenge | Yaml | none | acc | 0.2201 | ± | 0.0121 |
none | acc_norm | 0.2568 | ± | 0.0128 | ||
arc_easy | Yaml | none | acc | 0.5253 | ± | 0.0102 |
none | acc_norm | 0.4558 | ± | 0.0102 | ||
lambada_openai | Yaml | none | perplexity | 11.3766 | ± | 0.3623 |
none | acc | 0.4844 | ± | 0.0070 | ||
logiqa | Yaml | none | acc | 0.2120 | ± | 0.0160 |
none | acc_norm | 0.2780 | ± | 0.0176 | ||
piqa | Yaml | none | acc | 0.6817 | ± | 0.0109 |
none | acc_norm | 0.6828 | ± | 0.0109 | ||
sciq | Yaml | none | acc | 0.8130 | ± | 0.0123 |
none | acc_norm | 0.7090 | ± | 0.0144 | ||
winogrande | Yaml | none | acc | 0.5375 | ± | 0.0140 |
wsc | Yaml | none | acc | 0.3654 | ± | 0.0474 |