digitous
/

GPT-R

Text Generation

Inference Endpoints

Model card Files Files and versions Community

GPT-R / README.md

digitous's picture

Update README.md

92b955a almost 2 years ago

|

history blame contribute delete

2.69 kB

	---
	license: bigscience-openrail-m
	language:
	- en
	---
	GPT-R [Ronin]

	GPT-R is an experimental model containing a parameter-wise 60/40 blend (weighted average) of the weights of ppo_hh_gpt-j and GPT-JT-6B-v1.

	-Intended Merge Value-

	As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
	GPT-Ronin combines ppo_hh_gpt-j and GPT-JT; both technical
	achievements are blended with the intent to elevate the strengths of
	both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
	the largest impact on the usefulness of a model without the expense of
	fine-tuning. Blend was done in FP32 and output in FP16.

	-Intended Use-

	Research purposes only, intended for responsible use.
	Express a task in natural language, and GPT-R will do the thing.
	Try telling it "Write an article about X but put Y spin on it.",
	"Write a five step numbered guide on how to do X.", or any other
	basic instructions. It does its best.

	Can also be used as a base to merge with conversational,
	story writing, or adventure themed models of the same class
	(GPT-J & 6b NeoX) and parameter size (6b) to experiment with
	the morphology of model weights based on the value added
	by instruct.

	Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.7, Temperature at 0.5, and Repetition Penalty at 1.14; extra samplers
	disabled.

	-Credits To-

	Core Model:
	https://huggingface.co/EleutherAI/gpt-j-6B
	Author:
	https://www.eleuther.ai/

	Model1; 60% ppo_hh_gpt-j:
	https://huggingface.co/reciprocate/ppo_hh_gpt-j

	Author Repo:
	https://huggingface.co/reciprocate

	Related; CarperAI:
	https://huggingface.co/CarperAI

	Dataset is a variant of the Helpful Harmless assistant themed
	dataset and Proximal Policy Optimization, specific datasets
	used are unknown; listed repo datasets include:
	https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
	https://huggingface.co/datasets/reciprocate/hh_eval_ilql

	PPO explained:
	https://paperswithcode.com/method/ppo
	Potential HH-type datasets utilized:
	https://huggingface.co/HuggingFaceH4
	https://huggingface.co/datasets/Anthropic/hh-rlhf

	Model2; 40% GPT-JT-6B-V1:
	https://huggingface.co/togethercomputer/GPT-JT-6B-v1

	Author Repo:
	https://huggingface.co/togethercomputer

	Related; BigScience:
	https://huggingface.co/bigscience

	Datasets:
	https://huggingface.co/datasets/the_pile
	https://huggingface.co/datasets/bigscience/P3
	https://github.com/allenai/natural-instructions
	https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html

	Weight merge Script credit to Concedo:
	https://huggingface.co/concedo