nabeelshan
/

rlhf-gpt2-pipeline

Text Generation

reinforcement-learning

instruction-tuning

Model card Files Files and versions

rlhf-gpt2-pipeline / ppo_aligned_final /generation_config.json

Nabeel Shan

Added SFT, Reward Model, and PPO-Aligned Model

46724ea 2 months ago

history blame contribute delete

119 Bytes

	{
	"_from_model_config": true,
	"bos_token_id": 50256,
	"eos_token_id": 50256,
	"transformers_version": "4.43.4"
	}