Leogrin
/

eleuther-pythia1b-hh-sft

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

eleuther-pythia1b-hh-sft / README.md

Leogrin's picture

Update README.md

9edcd60 11 months ago

|

raw history blame

No virus

1.71 kB

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	# Infos

	Pythia-1b supervised finetuned with Anthropic-hh-rlhf dataset for 1 epoch.

	[wandb log](https://wandb.ai/pythia_dpo/Pythia_DPO_new/runs/xk2ub7ig?workspace=user-leogrin)

	See [Pythia-1b](https://huggingface.co/EleutherAI/pythia-1b) for model details [(paper)](https://arxiv.org/abs/2101.00027).


	# Benchmark raw results:

	Results for the base model are taken from the [Pythia paper](https://arxiv.org/abs/2101.00027).

	## Zero shot


	\| Task \| 1B_base \| 1B_sft \|
	\|------------------\|----------------\|----------------\|
	\| Lambada (OpenAI) \| 0.562 ± 0.007 \| 0.563 ± 0.007 \|
	\| PIQA \| 0.707 ± 0.011 \| 0.711 ± 0.011 \|
	\| WinoGrande \| 0.537 ± 0.014 \| 0.534 ± 0.014 \|
	\| WSC \| 0.365 ± 0.047 \| 0.365 ± 0.047 \|
	\| ARC - Easy \| 0.569 ± 0.010 \| 0.583 ± 0.010 \|
	\| ARC - Challenge \| 0.244 ± 0.013 \| 0.248 ± 0.013 \|
	\| SciQ \| 0.840 ± 0.012 \| 0.847 ± 0.011 \|
	\| LogiQA \| 0.223 ± 0.016 \| -- \|

	## Five shot

	\| Task \| 1B_base \| 1B_sft \|
	\|------------------\|----------------\|----------------\|
	\| Lambada (OpenAI) \| 0.507 ± 0.007 \| 0.4722 ± 0.007 \|
	\| PIQA \| 0.705 ± 0.011 \| 0.7165 ± 0.0105\|
	\| WinoGrande \| 0.532 ± 0.014 \| 0.5343 ± 0.014 \|
	\| WSC \| 0.365 ± 0.047 \| 0.5000 ± 0.0493\|
	\| ARC - Easy \| 0.594 ± 0.010 \| 0.6010 ± 0.010 \|
	\| ARC - Challenge \| 0.259 ± 0.013 \| 0.2679 ± 0.0129\|
	\| SciQ \| 0.920 ± 0.009 \| 0.9100 ± 0.0091\|
	\| LogiQA \| 0.227 ± 0.016 \| N/A \|