Edit model card

Pythia-1.4b DPO finetuned using original DPO code with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-1.4b for model details (paper).

hf (pretrained=lomahony/pythia-1.4b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2816 ± 0.0131
none 0 acc_norm 0.3123 ± 0.0135
arc_easy 1 none 0 acc 0.6229 ± 0.0099
none 0 acc_norm 0.5459 ± 0.0102
boolq 2 none 0 acc 0.6229 ± 0.0085
hellaswag 1 none 0 acc 0.4191 ± 0.0049
none 0 acc_norm 0.5383 ± 0.0050
lambada_openai 1 none 0 perplexity 6.4790 ± 0.1947
none 0 acc 0.5674 ± 0.0069
openbookqa 1 none 0 acc 0.2280 ± 0.0188
none 0 acc_norm 0.3360 ± 0.0211
piqa 1 none 0 acc 0.7122 ± 0.0106
none 0 acc_norm 0.7214 ± 0.0105
sciq 1 none 0 acc 0.8480 ± 0.0114
none 0 acc_norm 0.7840 ± 0.0130
wikitext 2 none 0 word_perplexity 16.4022 ± N/A
none 0 byte_perplexity 1.6873 ± N/A
none 0 bits_per_byte 0.7547 ± N/A
winogrande 1 none 0 acc 0.5959 ± 0.0138

hf (pretrained=lomahony/pythia-1.4b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 5 acc 0.3089 ± 0.0135
none 5 acc_norm 0.3353 ± 0.0138
arc_easy 1 none 5 acc 0.6423 ± 0.0098
none 5 acc_norm 0.6334 ± 0.0099
boolq 2 none 5 acc 0.6291 ± 0.0084
hellaswag 1 none 5 acc 0.4124 ± 0.0049
none 5 acc_norm 0.5347 ± 0.0050
lambada_openai 1 none 5 perplexity 9.7688 ± 0.3083
none 5 acc 0.4904 ± 0.0070
openbookqa 1 none 5 acc 0.2260 ± 0.0187
none 5 acc_norm 0.3240 ± 0.0210
piqa 1 none 5 acc 0.7095 ± 0.0106
none 5 acc_norm 0.7165 ± 0.0105
sciq 1 none 5 acc 0.9140 ± 0.0089
none 5 acc_norm 0.9050 ± 0.0093
wikitext 2 none 5 word_perplexity 16.4022 ± N/A
none 5 byte_perplexity 1.6873 ± N/A
none 5 bits_per_byte 0.7547 ± N/A
winogrande 1 none 5 acc 0.5612 ± 0.0139
Downloads last month
8

Dataset used to train lomahony/pythia-1.4b-helpful-dpo

Collection including lomahony/pythia-1.4b-helpful-dpo