Edit model card

Pythia-2.8b DPO finetuned using original DPO code with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-2.8b for model details (paper).

hf (pretrained=lomahony/pythia-2.8b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.3157 ± 0.0136
none 0 acc_norm 0.3447 ± 0.0139
arc_easy 1 none 0 acc 0.6591 ± 0.0097
none 0 acc_norm 0.6002 ± 0.0101
boolq 2 none 0 acc 0.6239 ± 0.0085
hellaswag 1 none 0 acc 0.4671 ± 0.0050
none 0 acc_norm 0.6107 ± 0.0049
lambada_openai 1 none 0 perplexity 4.8811 ± 0.1354
none 0 acc 0.6264 ± 0.0067
openbookqa 1 none 0 acc 0.2820 ± 0.0201
none 0 acc_norm 0.4040 ± 0.0220
piqa 1 none 0 acc 0.7568 ± 0.0100
none 0 acc_norm 0.7557 ± 0.0100
sciq 1 none 0 acc 0.8900 ± 0.0099
none 0 acc_norm 0.8340 ± 0.0118
wikitext 2 none 0 word_perplexity 13.9186 ± N/A
none 0 byte_perplexity 1.6363 ± N/A
none 0 bits_per_byte 0.7104 ± N/A
winogrande 1 none 0 acc 0.6046 ± 0.0137

hf (pretrained=lomahony/pythia-2.8b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 5 acc 0.3498 ± 0.0139
none 5 acc_norm 0.3823 ± 0.0142
arc_easy 1 none 5 acc 0.6940 ± 0.0095
none 5 acc_norm 0.6940 ± 0.0095
boolq 2 none 5 acc 0.6440 ± 0.0084
hellaswag 1 none 5 acc 0.4596 ± 0.0050
none 5 acc_norm 0.6096 ± 0.0049
lambada_openai 1 none 5 perplexity 6.9027 ± 0.2030
none 5 acc 0.5614 ± 0.0069
openbookqa 1 none 5 acc 0.2920 ± 0.0204
none 5 acc_norm 0.3820 ± 0.0218
piqa 1 none 5 acc 0.7601 ± 0.0100
none 5 acc_norm 0.7563 ± 0.0100
sciq 1 none 5 acc 0.9380 ± 0.0076
none 5 acc_norm 0.9290 ± 0.0081
wikitext 2 none 5 word_perplexity 13.9186 ± N/A
none 5 byte_perplexity 1.6363 ± N/A
none 5 bits_per_byte 0.7104 ± N/A
winogrande 1 none 5 acc 0.6006 ± 0.0138
Downloads last month
29

Dataset used to train lomahony/pythia-2.8b-helpful-dpo

Collection including lomahony/pythia-2.8b-helpful-dpo