Update README.md

0c8bf23 11 months ago

No virus

2.15 kB

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---


	# Infos

	Pythia-1b supervised finetuned with Anthropic-hh-rlhf dataset for 1 epoch (sft-model), before DPO [(paper)](https://arxiv.org/abs/2305.18290) with same dataset for 1 epoch.

	[wandb log](https://wandb.ai/pythia_dpo/Pythia_DPO_new/runs/jk09pzqb)

	See [Pythia-1b](https://huggingface.co/EleutherAI/pythia-1b) for model details [(paper)](https://arxiv.org/abs/2101.00027).


	# Benchmark raw results:

	Results for the base model are taken from the [Pythia paper](https://arxiv.org/abs/2101.00027).

	## Zero shot

	\| Task \| 1B_base \| 1B_sft \| 1B_dpo \|
	\|------------------\|----------------\|----------------\|-----------------\|
	\| Lambada (OpenAI) \| 0.562 ± 0.007 \| 0.563 ± 0.007 \| 0.5575 ± 0.0069 \|
	\| PIQA \| 0.707 ± 0.011 \| 0.711 ± 0.011 \| 0.7122 ± 0.0106 \|
	\| WinoGrande \| 0.537 ± 0.014 \| 0.534 ± 0.014 \| 0.5525 ± 0.0140 \|
	\| WSC \| 0.365 ± 0.047 \| 0.365 ± 0.047 \| 0.3654 ± 0.0474 \|
	\| ARC - Easy \| 0.569 ± 0.010 \| 0.583 ± 0.010 \| 0.5901 ± 0.0101 \|
	\| ARC - Challenge \| 0.244 ± 0.013 \| 0.248 ± 0.013 \| 0.2611 ± 0.0128 \|
	\| SciQ \| 0.840 ± 0.012 \| 0.847 ± 0.011 \| 0.8530 ± 0.0112 \|
	\| LogiQA \| 0.223 ± 0.016 \| N/A \| N/A \|


	## Five shot

	\| Task \| 1B_base \| 1B_sft \| 1B_dpo \|
	\|------------------\|----------------\|----------------\|-----------------\|
	\| Lambada (OpenAI) \| 0.507 ± 0.007 \| 0.4722 ± 0.007 \| 0.4669 ± 0.0070 \|
	\| PIQA \| 0.705 ± 0.011 \| 0.7165 ± 0.0105\| 0.7138 ± 0.0105 \|
	\| WinoGrande \| 0.532 ± 0.014 \| 0.5343 ± 0.014 \| 0.5525 ± 0.0140 \|
	\| WSC \| 0.365 ± 0.047 \| 0.5000 ± 0.0493\| 0.5577 ± 0.0489 \|
	\| ARC - Easy \| 0.594 ± 0.010 \| 0.6010 ± 0.010 \| 0.6170 ± 0.0100 \|
	\| ARC - Challenge \| 0.259 ± 0.013 \| 0.2679 ± 0.0129\| 0.2833 ± 0.0132 \|
	\| SciQ \| 0.920 ± 0.009 \| 0.9100 ± 0.0091\| 0.9020 ± 0.0094 \|
	\| LogiQA \| 0.227 ± 0.016 \| N/A \| N/A \|

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---


	# Infos

	Pythia-1b supervised finetuned with Anthropic-hh-rlhf dataset for 1 epoch (sft-model), before DPO [(paper)](https://arxiv.org/abs/2305.18290) with same dataset for 1 epoch.

	[wandb log](https://wandb.ai/pythia_dpo/Pythia_DPO_new/runs/jk09pzqb)

	See [Pythia-1b](https://huggingface.co/EleutherAI/pythia-1b) for model details [(paper)](https://arxiv.org/abs/2101.00027).


	# Benchmark raw results:

	Results for the base model are taken from the [Pythia paper](https://arxiv.org/abs/2101.00027).

	## Zero shot

	\| Task \| 1B_base \| 1B_sft \| 1B_dpo \|
	\|------------------\|----------------\|----------------\|-----------------\|
	\| Lambada (OpenAI) \| 0.562 ± 0.007 \| 0.563 ± 0.007 \| 0.5575 ± 0.0069 \|
	\| PIQA \| 0.707 ± 0.011 \| 0.711 ± 0.011 \| 0.7122 ± 0.0106 \|
	\| WinoGrande \| 0.537 ± 0.014 \| 0.534 ± 0.014 \| 0.5525 ± 0.0140 \|
	\| WSC \| 0.365 ± 0.047 \| 0.365 ± 0.047 \| 0.3654 ± 0.0474 \|
	\| ARC - Easy \| 0.569 ± 0.010 \| 0.583 ± 0.010 \| 0.5901 ± 0.0101 \|
	\| ARC - Challenge \| 0.244 ± 0.013 \| 0.248 ± 0.013 \| 0.2611 ± 0.0128 \|
	\| SciQ \| 0.840 ± 0.012 \| 0.847 ± 0.011 \| 0.8530 ± 0.0112 \|
	\| LogiQA \| 0.223 ± 0.016 \| N/A \| N/A \|


	## Five shot

	\| Task \| 1B_base \| 1B_sft \| 1B_dpo \|
	\|------------------\|----------------\|----------------\|-----------------\|
	\| Lambada (OpenAI) \| 0.507 ± 0.007 \| 0.4722 ± 0.007 \| 0.4669 ± 0.0070 \|
	\| PIQA \| 0.705 ± 0.011 \| 0.7165 ± 0.0105\| 0.7138 ± 0.0105 \|
	\| WinoGrande \| 0.532 ± 0.014 \| 0.5343 ± 0.014 \| 0.5525 ± 0.0140 \|
	\| WSC \| 0.365 ± 0.047 \| 0.5000 ± 0.0493\| 0.5577 ± 0.0489 \|
	\| ARC - Easy \| 0.594 ± 0.010 \| 0.6010 ± 0.010 \| 0.6170 ± 0.0100 \|
	\| ARC - Challenge \| 0.259 ± 0.013 \| 0.2679 ± 0.0129\| 0.2833 ± 0.0132 \|
	\| SciQ \| 0.920 ± 0.009 \| 0.9100 ± 0.0091\| 0.9020 ± 0.0094 \|
	\| LogiQA \| 0.227 ± 0.016 \| N/A \| N/A \|