Update README.md

0fecdb1 verified about 1 month ago

No virus

4.84 kB

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	[Pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) supervised finetuned using TRLx library with the helpful subset of [Anthropic-hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) for 1 epoch.

	Checkpoints are also uploaded.

	Fully reproducible finetuning code is available on [GitHub](https://github.com/lauraaisling/trlx-pythia/tree/main)

	[wandb log](https://wandb.ai/lauraomahony999/pythia-sft/runs/3w7e3zmd)

	See [Pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) for model details [(paper)](https://arxiv.org/abs/2101.00027).

	See further details of these models in the paper [Attributing Mode Collapse in the Fine-Tuning of Large Language Models](https://openreview.net/pdf?id=3pDMYjpOxk).

	You can cite these models if they are helpful as follows:

	<pre>
	@inproceedings{o2024attributing,
	title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
	author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
	booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
	year={2024}
	}
	</pre>

	hf (pretrained=lomahony/pythia-70m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \| Stderr \|
	\|--------------\|------:\|------\|-----:\|---------------\|--------:\|---\|--------\|
	\|arc_challenge \| 1\|none \| 0\|acc \| 0.1715\|± \| 0.0110\|
	\| \| \|none \| 0\|acc_norm \| 0.2082\|± \| 0.0119\|
	\|arc_easy \| 1\|none \| 0\|acc \| 0.3384\|± \| 0.0097\|
	\| \| \|none \| 0\|acc_norm \| 0.3262\|± \| 0.0096\|
	\|boolq \| 2\|none \| 0\|acc \| 0.4239\|± \| 0.0086\|
	\|hellaswag \| 1\|none \| 0\|acc \| 0.2629\|± \| 0.0044\|
	\| \| \|none \| 0\|acc_norm \| 0.2691\|± \| 0.0044\|
	\|lambada_openai\| 1\|none \| 0\|perplexity \|5937.7964\|± \|424.7555\|
	\| \| \|none \| 0\|acc \| 0.0328\|± \| 0.0025\|
	\|openbookqa \| 1\|none \| 0\|acc \| 0.1580\|± \| 0.0163\|
	\| \| \|none \| 0\|acc_norm \| 0.2520\|± \| 0.0194\|
	\|piqa \| 1\|none \| 0\|acc \| 0.5593\|± \| 0.0116\|
	\| \| \|none \| 0\|acc_norm \| 0.5392\|± \| 0.0116\|
	\|sciq \| 1\|none \| 0\|acc \| 0.3710\|± \| 0.0153\|
	\| \| \|none \| 0\|acc_norm \| 0.4990\|± \| 0.0158\|
	\|wikitext \| 2\|none \| 0\|word_perplexity\| 550.5954\|± \|N/A \|
	\| \| \|none \| 0\|byte_perplexity\| 3.2550\|± \|N/A \|
	\| \| \|none \| 0\|bits_per_byte \| 1.7027\|± \|N/A \|
	\|winogrande \| 1\|none \| 0\|acc \| 0.4878\|± \| 0.0140\|

	hf (pretrained=lomahony/pythia-70m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \| Stderr \|
	\|--------------\|------:\|------\|-----:\|---------------\|---------:\|---\|---------\|
	\|arc_challenge \| 1\|none \| 5\|acc \| 0.1869\|± \| 0.0114\|
	\| \| \|none \| 5\|acc_norm \| 0.2210\|± \| 0.0121\|
	\|arc_easy \| 1\|none \| 5\|acc \| 0.3207\|± \| 0.0096\|
	\| \| \|none \| 5\|acc_norm \| 0.3245\|± \| 0.0096\|
	\|boolq \| 2\|none \| 5\|acc \| 0.4159\|± \| 0.0086\|
	\|hellaswag \| 1\|none \| 5\|acc \| 0.2633\|± \| 0.0044\|
	\| \| \|none \| 5\|acc_norm \| 0.2596\|± \| 0.0044\|
	\|lambada_openai\| 1\|none \| 5\|perplexity \|19968.0749\|± \|1423.3001\|
	\| \| \|none \| 5\|acc \| 0.0202\|± \| 0.0020\|
	\|openbookqa \| 1\|none \| 5\|acc \| 0.1440\|± \| 0.0157\|
	\| \| \|none \| 5\|acc_norm \| 0.2420\|± \| 0.0192\|
	\|piqa \| 1\|none \| 5\|acc \| 0.5359\|± \| 0.0116\|
	\| \| \|none \| 5\|acc_norm \| 0.5229\|± \| 0.0117\|
	\|sciq \| 1\|none \| 5\|acc \| 0.3240\|± \| 0.0148\|
	\| \| \|none \| 5\|acc_norm \| 0.4310\|± \| 0.0157\|
	\|wikitext \| 2\|none \| 5\|word_perplexity\| 550.5954\|± \|N/A \|
	\| \| \|none \| 5\|byte_perplexity\| 3.2550\|± \|N/A \|
	\| \| \|none \| 5\|bits_per_byte \| 1.7027\|± \|N/A \|
	\|winogrande \| 1\|none \| 5\|acc \| 0.5154\|± \| 0.0140\|

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	[Pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) supervised finetuned using TRLx library with the helpful subset of [Anthropic-hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) for 1 epoch.

	Checkpoints are also uploaded.

	Fully reproducible finetuning code is available on [GitHub](https://github.com/lauraaisling/trlx-pythia/tree/main)

	[wandb log](https://wandb.ai/lauraomahony999/pythia-sft/runs/3w7e3zmd)

	See [Pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) for model details [(paper)](https://arxiv.org/abs/2101.00027).

	See further details of these models in the paper [Attributing Mode Collapse in the Fine-Tuning of Large Language Models](https://openreview.net/pdf?id=3pDMYjpOxk).

	You can cite these models if they are helpful as follows:

	<pre>
	@inproceedings{o2024attributing,
	title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
	author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
	booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
	year={2024}
	}
	</pre>

	hf (pretrained=lomahony/pythia-70m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \| Stderr \|
	\|--------------\|------:\|------\|-----:\|---------------\|--------:\|---\|--------\|
	\|arc_challenge \| 1\|none \| 0\|acc \| 0.1715\|± \| 0.0110\|
	\| \| \|none \| 0\|acc_norm \| 0.2082\|± \| 0.0119\|
	\|arc_easy \| 1\|none \| 0\|acc \| 0.3384\|± \| 0.0097\|
	\| \| \|none \| 0\|acc_norm \| 0.3262\|± \| 0.0096\|
	\|boolq \| 2\|none \| 0\|acc \| 0.4239\|± \| 0.0086\|
	\|hellaswag \| 1\|none \| 0\|acc \| 0.2629\|± \| 0.0044\|
	\| \| \|none \| 0\|acc_norm \| 0.2691\|± \| 0.0044\|
	\|lambada_openai\| 1\|none \| 0\|perplexity \|5937.7964\|± \|424.7555\|
	\| \| \|none \| 0\|acc \| 0.0328\|± \| 0.0025\|
	\|openbookqa \| 1\|none \| 0\|acc \| 0.1580\|± \| 0.0163\|
	\| \| \|none \| 0\|acc_norm \| 0.2520\|± \| 0.0194\|
	\|piqa \| 1\|none \| 0\|acc \| 0.5593\|± \| 0.0116\|
	\| \| \|none \| 0\|acc_norm \| 0.5392\|± \| 0.0116\|
	\|sciq \| 1\|none \| 0\|acc \| 0.3710\|± \| 0.0153\|
	\| \| \|none \| 0\|acc_norm \| 0.4990\|± \| 0.0158\|
	\|wikitext \| 2\|none \| 0\|word_perplexity\| 550.5954\|± \|N/A \|
	\| \| \|none \| 0\|byte_perplexity\| 3.2550\|± \|N/A \|
	\| \| \|none \| 0\|bits_per_byte \| 1.7027\|± \|N/A \|
	\|winogrande \| 1\|none \| 0\|acc \| 0.4878\|± \| 0.0140\|

	hf (pretrained=lomahony/pythia-70m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \| Stderr \|
	\|--------------\|------:\|------\|-----:\|---------------\|---------:\|---\|---------\|
	\|arc_challenge \| 1\|none \| 5\|acc \| 0.1869\|± \| 0.0114\|
	\| \| \|none \| 5\|acc_norm \| 0.2210\|± \| 0.0121\|
	\|arc_easy \| 1\|none \| 5\|acc \| 0.3207\|± \| 0.0096\|
	\| \| \|none \| 5\|acc_norm \| 0.3245\|± \| 0.0096\|
	\|boolq \| 2\|none \| 5\|acc \| 0.4159\|± \| 0.0086\|
	\|hellaswag \| 1\|none \| 5\|acc \| 0.2633\|± \| 0.0044\|
	\| \| \|none \| 5\|acc_norm \| 0.2596\|± \| 0.0044\|
	\|lambada_openai\| 1\|none \| 5\|perplexity \|19968.0749\|± \|1423.3001\|
	\| \| \|none \| 5\|acc \| 0.0202\|± \| 0.0020\|
	\|openbookqa \| 1\|none \| 5\|acc \| 0.1440\|± \| 0.0157\|
	\| \| \|none \| 5\|acc_norm \| 0.2420\|± \| 0.0192\|
	\|piqa \| 1\|none \| 5\|acc \| 0.5359\|± \| 0.0116\|
	\| \| \|none \| 5\|acc_norm \| 0.5229\|± \| 0.0117\|
	\|sciq \| 1\|none \| 5\|acc \| 0.3240\|± \| 0.0148\|
	\| \| \|none \| 5\|acc_norm \| 0.4310\|± \| 0.0157\|
	\|wikitext \| 2\|none \| 5\|word_perplexity\| 550.5954\|± \|N/A \|
	\| \| \|none \| 5\|byte_perplexity\| 3.2550\|± \|N/A \|
	\| \| \|none \| 5\|bits_per_byte \| 1.7027\|± \|N/A \|
	\|winogrande \| 1\|none \| 5\|acc \| 0.5154\|± \| 0.0140\|