Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
honggen
/
hard_dpo
like
0
Text Generation
Anthropic/hh-rlhf
English
License:
apache-2.0
Model card
Files
Files and versions
Community
Edit model card
The reference model after supervised fine-tuning on the chosen response.
Downloads last month
0
Inference Examples
Text Generation
Unable to determine this model's library. Check the
docs
.
Maximize
Dataset used to train
honggen/hard_dpo
Anthropic/hh-rlhf
Viewer
•
Updated
May 26, 2023
•
147k
•
1.05k