Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
nabeelshan
/
rlhf-gpt2-pipeline
like
0
Text Generation
Transformers
Safetensors
Dahoas/synthetic-instruct-gptj-pairwise
English
gpt2
rlhf
reinforcement-learning
ppo
reward-model
instruction-tuning
Eval Results
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
rlhf-gpt2-pipeline
/
ppo_aligned_final
/
merges.txt
Nabeel Shan
Add tokenizer files
b461de7
2 months ago
raw
Copy download link
history
contribute
delete
Safe
456 kB
File too large to display, you can
check the raw version
instead.