Edit model card

Pythia-1b supervised finetuned using TRLx library with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-1b for model details (paper).

hf (pretrained=lomahony/pythia-1b-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2543 ± 0.0127
none 0 acc_norm 0.2739 ± 0.0130
arc_easy 1 none 0 acc 0.5724 ± 0.0102
none 0 acc_norm 0.4941 ± 0.0103
boolq 2 none 0 acc 0.6199 ± 0.0085
hellaswag 1 none 0 acc 0.3819 ± 0.0048
none 0 acc_norm 0.4736 ± 0.0050
lambada_openai 1 none 0 perplexity 7.1374 ± 0.2014
none 0 acc 0.5626 ± 0.0069
openbookqa 1 none 0 acc 0.2040 ± 0.0180
none 0 acc_norm 0.3140 ± 0.0208
piqa 1 none 0 acc 0.7138 ± 0.0105
none 0 acc_norm 0.6997 ± 0.0107
sciq 1 none 0 acc 0.8400 ± 0.0116
none 0 acc_norm 0.7620 ± 0.0135
wikitext 2 none 0 word_perplexity 16.9719 ± N/A
none 0 byte_perplexity 1.6981 ± N/A
none 0 bits_per_byte 0.7639 ± N/A
winogrande 1 none 0 acc 0.5343 ± 0.0140

hf (pretrained=lomahony/pythia-1b-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 5 acc 0.2628 ± 0.0129
none 5 acc_norm 0.2918 ± 0.0133
arc_easy 1 none 5 acc 0.6040 ± 0.0100
none 5 acc_norm 0.5816 ± 0.0101
boolq 2 none 5 acc 0.5963 ± 0.0086
hellaswag 1 none 5 acc 0.3780 ± 0.0048
none 5 acc_norm 0.4719 ± 0.0050
lambada_openai 1 none 5 perplexity 10.2584 ± 0.2936
none 5 acc 0.4832 ± 0.0070
openbookqa 1 none 5 acc 0.1980 ± 0.0178
none 5 acc_norm 0.3220 ± 0.0209
piqa 1 none 5 acc 0.7057 ± 0.0106
none 5 acc_norm 0.7095 ± 0.0106
sciq 1 none 5 acc 0.8980 ± 0.0096
none 5 acc_norm 0.9000 ± 0.0095
wikitext 2 none 5 word_perplexity 16.9719 ± N/A
none 5 byte_perplexity 1.6981 ± N/A
none 5 bits_per_byte 0.7639 ± N/A
winogrande 1 none 5 acc 0.5446 ± 0.0140
Downloads last month
4

Dataset used to train lomahony/pythia-1b-helpful-sft

Collection including lomahony/pythia-1b-helpful-sft