Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
RLHFlow
's Collections
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
Online RLHF
updated
15 days ago
Datasets, code, and models for online RLHF (i.e., iterative DPO)
Upvote
4
RLHFlow/prompt-collection-v0.1
Viewer
•
Updated
25 days ago
•
50
•
4
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
9 days ago
•
4.5k
•
21
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification
•
Updated
Apr 24
•
6.27k
•
20
RLHFlow/SFT-OpenHermes-2.5-Standard
Viewer
•
Updated
Apr 24
•
166
•
1
RLHFlow/iterative-prompt-v1-iter2-20K
Viewer
•
Updated
30 days ago
•
222
•
2
RLHFlow/iterative-prompt-v1-iter3-20K
Viewer
•
Updated
30 days ago
•
132
•
2
RLHFlow/iterative-prompt-v1-iter1-20K
Viewer
•
Updated
30 days ago
•
439
•
2
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
19 days ago
•
57
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
9 days ago
•
3.59k
•
3
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
8 days ago
•
673
•
32
Upvote
4
Share collection
View history
Collection guide
Browse collections