wandb: https://wandb.ai/eleutherai/pythia-rlhf/runs/6y83ekqy?workspace=user-yongzx
Model Evals
Task | Version | Filter | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
arc_challenge | Yaml | none | acc | 0.2526 | ± | 0.0127 |
none | acc_norm | 0.2773 | ± | 0.0131 | ||
arc_easy | Yaml | none | acc | 0.5791 | ± | 0.0101 |
none | acc_norm | 0.4912 | ± | 0.0103 | ||
lambada_openai | Yaml | none | perplexity | 7.0516 | ± | 0.1979 |
none | acc | 0.5684 | ± | 0.0069 | ||
logiqa | Yaml | none | acc | 0.2166 | ± | 0.0162 |
none | acc_norm | 0.2919 | ± | 0.0178 | ||
piqa | Yaml | none | acc | 0.7176 | ± | 0.0105 |
none | acc_norm | 0.6964 | ± | 0.0107 | ||
sciq | Yaml | none | acc | 0.8460 | ± | 0.0114 |
none | acc_norm | 0.7700 | ± | 0.0133 | ||
winogrande | Yaml | none | acc | 0.5399 | ± | 0.0140 |