pythia-2.8b-helpful-dpo / README.md

lomahony

Update README.md

454bbdb verified 6 months ago

preview code

raw

history blame

No virus

2.39 kB

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	[Pythia-2.8b](https://huggingface.co/EleutherAI/pythia-2.8b) DPO finetuned using original DPO code with the helpful subset of [Anthropic-hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) for 1 epoch.

	Checkpoints are also uploaded.

	Fully reproducible finetuning code is available on [GitHub](https://github.com/lomahony/direct-preference-optimization/tree/main)

	[wandb log](https://wandb.ai/lauraomahony999/pythia-dpo/runs/blurtl4v)

	See [Pythia-2.8b](https://huggingface.co/EleutherAI/pythia-2.8b) for model details [(paper)](https://arxiv.org/abs/2101.00027).

	hf (pretrained=lomahony/pythia-2.8b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 0\|acc \| 0.3157\|± \|0.0136\|
	\| \| \|none \| 0\|acc_norm \| 0.3447\|± \|0.0139\|
	\|arc_easy \| 1\|none \| 0\|acc \| 0.6591\|± \|0.0097\|
	\| \| \|none \| 0\|acc_norm \| 0.6002\|± \|0.0101\|
	\|boolq \| 2\|none \| 0\|acc \| 0.6239\|± \|0.0085\|
	\|hellaswag \| 1\|none \| 0\|acc \| 0.4671\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm \| 0.6107\|± \|0.0049\|
	\|lambada_openai\| 1\|none \| 0\|perplexity \| 4.8811\|± \|0.1354\|
	\| \| \|none \| 0\|acc \| 0.6264\|± \|0.0067\|
	\|openbookqa \| 1\|none \| 0\|acc \| 0.2820\|± \|0.0201\|
	\| \| \|none \| 0\|acc_norm \| 0.4040\|± \|0.0220\|
	\|piqa \| 1\|none \| 0\|acc \| 0.7568\|± \|0.0100\|
	\| \| \|none \| 0\|acc_norm \| 0.7557\|± \|0.0100\|
	\|sciq \| 1\|none \| 0\|acc \| 0.8900\|± \|0.0099\|
	\| \| \|none \| 0\|acc_norm \| 0.8340\|± \|0.0118\|
	\|wikitext \| 2\|none \| 0\|word_perplexity\|13.9186\|± \|N/A \|
	\| \| \|none \| 0\|byte_perplexity\| 1.6363\|± \|N/A \|
	\| \| \|none \| 0\|bits_per_byte \| 0.7104\|± \|N/A \|
	\|winogrande \| 1\|none \| 0\|acc \| 0.6046\|± \|0.0137\|

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	[Pythia-2.8b](https://huggingface.co/EleutherAI/pythia-2.8b) DPO finetuned using original DPO code with the helpful subset of [Anthropic-hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) for 1 epoch.

	Checkpoints are also uploaded.

	Fully reproducible finetuning code is available on [GitHub](https://github.com/lomahony/direct-preference-optimization/tree/main)

	[wandb log](https://wandb.ai/lauraomahony999/pythia-dpo/runs/blurtl4v)

	See [Pythia-2.8b](https://huggingface.co/EleutherAI/pythia-2.8b) for model details [(paper)](https://arxiv.org/abs/2101.00027).

	hf (pretrained=lomahony/pythia-2.8b-helpful-dpo), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 0\|acc \| 0.3157\|± \|0.0136\|
	\| \| \|none \| 0\|acc_norm \| 0.3447\|± \|0.0139\|
	\|arc_easy \| 1\|none \| 0\|acc \| 0.6591\|± \|0.0097\|
	\| \| \|none \| 0\|acc_norm \| 0.6002\|± \|0.0101\|
	\|boolq \| 2\|none \| 0\|acc \| 0.6239\|± \|0.0085\|
	\|hellaswag \| 1\|none \| 0\|acc \| 0.4671\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm \| 0.6107\|± \|0.0049\|
	\|lambada_openai\| 1\|none \| 0\|perplexity \| 4.8811\|± \|0.1354\|
	\| \| \|none \| 0\|acc \| 0.6264\|± \|0.0067\|
	\|openbookqa \| 1\|none \| 0\|acc \| 0.2820\|± \|0.0201\|
	\| \| \|none \| 0\|acc_norm \| 0.4040\|± \|0.0220\|
	\|piqa \| 1\|none \| 0\|acc \| 0.7568\|± \|0.0100\|
	\| \| \|none \| 0\|acc_norm \| 0.7557\|± \|0.0100\|
	\|sciq \| 1\|none \| 0\|acc \| 0.8900\|± \|0.0099\|
	\| \| \|none \| 0\|acc_norm \| 0.8340\|± \|0.0118\|
	\|wikitext \| 2\|none \| 0\|word_perplexity\|13.9186\|± \|N/A \|
	\| \| \|none \| 0\|byte_perplexity\| 1.6363\|± \|N/A \|
	\| \| \|none \| 0\|bits_per_byte \| 0.7104\|± \|N/A \|
	\|winogrande \| 1\|none \| 0\|acc \| 0.6046\|± \|0.0137\|