Edit model card

Pythia-410m DPO finetuned using original DPO code with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-410m for model details (paper).

hf (pretrained=lomahony/pythia-410m-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2338 ± 0.0124
none 0 acc_norm 0.2602 ± 0.0128
arc_easy 1 none 0 acc 0.5185 ± 0.0103
none 0 acc_norm 0.4609 ± 0.0102
boolq 2 none 0 acc 0.6214 ± 0.0085
hellaswag 1 none 0 acc 0.3447 ± 0.0047
none 0 acc_norm 0.4074 ± 0.0049
lambada_openai 1 none 0 perplexity 19.0431 ± 0.7027
none 0 acc 0.3978 ± 0.0068
openbookqa 1 none 0 acc 0.2000 ± 0.0179
none 0 acc_norm 0.3100 ± 0.0207
piqa 1 none 0 acc 0.6779 ± 0.0109
none 0 acc_norm 0.6757 ± 0.0109
sciq 1 none 0 acc 0.7760 ± 0.0132
none 0 acc_norm 0.6690 ± 0.0149
wikitext 2 none 0 word_perplexity 24.3807 ± N/A
none 0 byte_perplexity 1.8171 ± N/A
none 0 bits_per_byte 0.8617 ± N/A
winogrande 1 none 0 acc 0.5343 ± 0.0140

hf (pretrained=lomahony/pythia-410m-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 5 acc 0.2346 ± 0.0124
none 5 acc_norm 0.2747 ± 0.0130
arc_easy 1 none 5 acc 0.5509 ± 0.0102
none 5 acc_norm 0.5198 ± 0.0103
boolq 2 none 5 acc 0.5982 ± 0.0086
hellaswag 1 none 5 acc 0.3437 ± 0.0047
none 5 acc_norm 0.4059 ± 0.0049
lambada_openai 1 none 5 perplexity 34.3002 ± 1.3044
none 5 acc 0.3148 ± 0.0065
openbookqa 1 none 5 acc 0.1740 ± 0.0170
none 5 acc_norm 0.2880 ± 0.0203
piqa 1 none 5 acc 0.6741 ± 0.0109
none 5 acc_norm 0.6670 ± 0.0110
sciq 1 none 5 acc 0.8520 ± 0.0112
none 5 acc_norm 0.8350 ± 0.0117
wikitext 2 none 5 word_perplexity 24.3807 ± N/A
none 5 byte_perplexity 1.8171 ± N/A
none 5 bits_per_byte 0.8617 ± N/A
winogrande 1 none 5 acc 0.5162 ± 0.0140
Downloads last month
11

Dataset used to train lomahony/pythia-410m-helpful-dpo

Collection including lomahony/pythia-410m-helpful-dpo