lomahony's picture
Update README.md
454bbdb verified
metadata
language:
  - en
tags:
  - pytorch
  - causal-lm
  - pythia
license: apache-2.0
datasets:
  - Anthropic/hh-rlhf

Pythia-2.8b DPO finetuned using original DPO code with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.

Checkpoints are also uploaded.

Fully reproducible finetuning code is available on GitHub

wandb log

See Pythia-2.8b for model details (paper).

hf (pretrained=lomahony/pythia-2.8b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.3157 ± 0.0136
none 0 acc_norm 0.3447 ± 0.0139
arc_easy 1 none 0 acc 0.6591 ± 0.0097
none 0 acc_norm 0.6002 ± 0.0101
boolq 2 none 0 acc 0.6239 ± 0.0085
hellaswag 1 none 0 acc 0.4671 ± 0.0050
none 0 acc_norm 0.6107 ± 0.0049
lambada_openai 1 none 0 perplexity 4.8811 ± 0.1354
none 0 acc 0.6264 ± 0.0067
openbookqa 1 none 0 acc 0.2820 ± 0.0201
none 0 acc_norm 0.4040 ± 0.0220
piqa 1 none 0 acc 0.7568 ± 0.0100
none 0 acc_norm 0.7557 ± 0.0100
sciq 1 none 0 acc 0.8900 ± 0.0099
none 0 acc_norm 0.8340 ± 0.0118
wikitext 2 none 0 word_perplexity 13.9186 ± N/A
none 0 byte_perplexity 1.6363 ± N/A
none 0 bits_per_byte 0.7104 ± N/A
winogrande 1 none 0 acc 0.6046 ± 0.0137