Edit model card

Pythia-1b DPO finetuned using original DPO code with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-1b for model details (paper).

hf (pretrained=lomahony/pythia-1b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2602 ± 0.0128
none 0 acc_norm 0.2867 ± 0.0132
arc_easy 1 none 0 acc 0.5859 ± 0.0101
none 0 acc_norm 0.5008 ± 0.0103
boolq 2 none 0 acc 0.6205 ± 0.0085
hellaswag 1 none 0 acc 0.3895 ± 0.0049
none 0 acc_norm 0.4872 ± 0.0050
lambada_openai 1 none 0 perplexity 6.9417 ± 0.2019
none 0 acc 0.5550 ± 0.0069
openbookqa 1 none 0 acc 0.2140 ± 0.0184
none 0 acc_norm 0.3220 ± 0.0209
piqa 1 none 0 acc 0.7193 ± 0.0105
none 0 acc_norm 0.7008 ± 0.0107
sciq 1 none 0 acc 0.8450 ± 0.0115
none 0 acc_norm 0.7600 ± 0.0135
wikitext 2 none 0 word_perplexity 17.2316 ± N/A
none 0 byte_perplexity 1.7029 ± N/A
none 0 bits_per_byte 0.7680 ± N/A
winogrande 1 none 0 acc 0.5367 ± 0.0140

hf (pretrained=lomahony/pythia-1b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 5 acc 0.2662 ± 0.0129
none 5 acc_norm 0.3003 ± 0.0134
arc_easy 1 none 5 acc 0.6103 ± 0.0100
none 5 acc_norm 0.5892 ± 0.0101
boolq 2 none 5 acc 0.6284 ± 0.0085
hellaswag 1 none 5 acc 0.3841 ± 0.0049
none 5 acc_norm 0.4845 ± 0.0050
lambada_openai 1 none 5 perplexity 9.6301 ± 0.2809
none 5 acc 0.4865 ± 0.0070
openbookqa 1 none 5 acc 0.2020 ± 0.0180
none 5 acc_norm 0.3300 ± 0.0210
piqa 1 none 5 acc 0.7122 ± 0.0106
none 5 acc_norm 0.7046 ± 0.0106
sciq 1 none 5 acc 0.9030 ± 0.0094
none 5 acc_norm 0.8980 ± 0.0096
wikitext 2 none 5 word_perplexity 17.2316 ± N/A
none 5 byte_perplexity 1.7029 ± N/A
none 5 bits_per_byte 0.7680 ± N/A
winogrande 1 none 5 acc 0.5296 ± 0.0140
Downloads last month
8

Dataset used to train lomahony/pythia-1b-helpful-dpo

Collection including lomahony/pythia-1b-helpful-dpo