pythia-1.4b-sft-hh / README.md
yongzx's picture
Add Model Evals (#1)
5956541
|
raw
history blame
1.02 kB

wandb: https://wandb.ai/eleutherai/pythia-rlhf/runs/8p0wfi7m?workspace=user-yongzx

Model Evals:

Task Version Filter Metric Value Stderr
arc_challenge Yaml none acc 0.2654 ± 0.0129
none acc_norm 0.2875 ± 0.0132
arc_easy Yaml none acc 0.6149 ± 0.0100
none acc_norm 0.5391 ± 0.0102
lambada_openai Yaml none perplexity 5.6120 ± 0.1509
none acc 0.6146 ± 0.0068
logiqa Yaml none acc 0.1951 ± 0.0155
none acc_norm 0.2796 ± 0.0176
piqa Yaml none acc 0.7160 ± 0.0105
none acc_norm 0.7182 ± 0.0105
sciq Yaml none acc 0.8610 ± 0.0109
none acc_norm 0.7930 ± 0.0128
winogrande Yaml none acc 0.5754 ± 0.0139