pythia-160m-sft-hh / README.md
yongzx's picture
Add Model Evals (#1)
45f06b4
|
raw
history blame
992 Bytes
wandb run: https://wandb.ai/eleutherai/pythia-rlhf/runs/e0drjcsz?workspace=user-yongzx
Model Evals:
| Task |Version|Filter| Metric |Value | |Stderr|
|-------------|-------|------|--------|-----:|---|-----:|
|arc_challenge|Yaml |none |acc |0.1877|± |0.0114|
| | |none |acc_norm|0.2372|± |0.0124|
|arc_easy |Yaml |none |acc |0.4390|± |0.0102|
| | |none |acc_norm|0.4082|± |0.0101|
|logiqa |Yaml |none |acc |0.1889|± |0.0154|
| | |none |acc_norm|0.2473|± |0.0169|
|piqa |Yaml |none |acc |0.6213|± |0.0113|
| | |none |acc_norm|0.6279|± |0.0113|
|sciq |Yaml |none |acc |0.7230|± |0.0142|
| | |none |acc_norm|0.6840|± |0.0147|
|winogrande |Yaml |none |acc |0.5162|± |0.0140|
|lambada_openai|Yaml |none |perplexity|58.9478|± |2.7662|
| | |none |acc | 0.2602|± |0.0061|