wandb: https://wandb.ai/eleutherai/pythia-rlhf/runs/8p0wfi7m?workspace=user-yongzx
Model Evals:
Task | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
arc_challenge | Yaml | none | acc | 0.2654 | ± | 0.0129 |
none | acc_norm | 0.2875 | ± | 0.0132 | ||
arc_easy | Yaml | none | acc | 0.6149 | ± | 0.0100 |
none | acc_norm | 0.5391 | ± | 0.0102 | ||
lambada_openai | Yaml | none | perplexity | 5.6120 | ± | 0.1509 |
none | acc | 0.6146 | ± | 0.0068 | ||
logiqa | Yaml | none | acc | 0.1951 | ± | 0.0155 |
none | acc_norm | 0.2796 | ± | 0.0176 | ||
piqa | Yaml | none | acc | 0.7160 | ± | 0.0105 |
none | acc_norm | 0.7182 | ± | 0.0105 | ||
sciq | Yaml | none | acc | 0.8610 | ± | 0.0109 |
none | acc_norm | 0.7930 | ± | 0.0128 | ||
winogrande | Yaml | none | acc | 0.5754 | ± | 0.0139 |
wsc | Yaml | none | acc | 0.3654 | ± | 0.0474 |