Edit model card

Pythia-160m finetuned using original DPO code with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-160m for model details (paper).

hf (pretrained=lomahony/pythia-160m-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2125 ± 0.0120
none 0 acc_norm 0.2312 ± 0.0123
arc_easy 1 none 0 acc 0.3965 ± 0.0100
none 0 acc_norm 0.3830 ± 0.0100
boolq 2 none 0 acc 0.5853 ± 0.0086
hellaswag 1 none 0 acc 0.2811 ± 0.0045
none 0 acc_norm 0.2940 ± 0.0045
lambada_openai 1 none 0 perplexity 444.4464 ± 24.5439
none 0 acc 0.1034 ± 0.0042
openbookqa 1 none 0 acc 0.1500 ± 0.0160
none 0 acc_norm 0.2480 ± 0.0193
piqa 1 none 0 acc 0.5947 ± 0.0115
none 0 acc_norm 0.5876 ± 0.0115
sciq 1 none 0 acc 0.5880 ± 0.0156
none 0 acc_norm 0.6180 ± 0.0154
wikitext 2 none 0 word_perplexity 88.8633 ± N/A
none 0 byte_perplexity 2.3143 ± N/A
none 0 bits_per_byte 1.2106 ± N/A
winogrande 1 none 0 acc 0.4980 ± 0.0141

hf (pretrained=lomahony/pythia-160m-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 5 acc 0.1928 ± 0.0115
none 5 acc_norm 0.2398 ± 0.0125
arc_easy 1 none 5 acc 0.3678 ± 0.0099
none 5 acc_norm 0.3657 ± 0.0099
boolq 2 none 5 acc 0.5841 ± 0.0086
hellaswag 1 none 5 acc 0.2807 ± 0.0045
none 5 acc_norm 0.2876 ± 0.0045
lambada_openai 1 none 5 perplexity 1607.2529 ± 88.3065
none 5 acc 0.0574 ± 0.0032
openbookqa 1 none 5 acc 0.1580 ± 0.0163
none 5 acc_norm 0.2400 ± 0.0191
piqa 1 none 5 acc 0.5958 ± 0.0114
none 5 acc_norm 0.5773 ± 0.0115
sciq 1 none 5 acc 0.5110 ± 0.0158
none 5 acc_norm 0.5740 ± 0.0156
wikitext 2 none 5 word_perplexity 88.8633 ± N/A
none 5 byte_perplexity 2.3143 ± N/A
none 5 bits_per_byte 1.2106 ± N/A
winogrande 1 none 5 acc 0.5162 ± 0.0140
Downloads last month
7

Dataset used to train lomahony/pythia-160m-helpful-dpo

Collection including lomahony/pythia-160m-helpful-dpo