annaovesnaatatt
/

gpt2-post-ppo

Feature Extraction

Transformers

PyTorch

gpt2

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2-post-ppo / README.md

annaovesnaatatt

Create README.md

c60dc45 11 months ago

preview code

raw

history blame

No virus

275 Bytes

This is a testing model created for RLHF. The reward model used for training is martin-arguments and was trained on 1000 examples of the Anthropic/hh-rlhf dataset.