We collect the open-source datasets and process them into the standard format.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
9
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
984
•
35
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
9.84k
•
7
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
5.84k
•
41
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
•
634
RLHFlow/LLaMA3.2-3B-SFT
Text Generation
•
Updated
•
44
RLHFlow/LLaMA3.2-1B-SFT
Text Generation
•
Updated
•
18
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
15.1k
•
140
RLHFlow/DPA-v1-Mistral-7B
Text Generation
•
Updated
•
9
•
1
RLHFlow/RewardModel-Mistral-7B-for-DPA-v1
Text Classification
•
Updated
•
119
•
1
datasets
43
RLHFlow/Llama3-SFT-RAFT-Ultrafeedback-iter1
Viewer
•
Updated
•
20k
•
10
RLHFlow/ultrafeedback_iter3
Viewer
•
Updated
•
19.6k
•
56
RLHFlow/ultrafeedback_iter2
Viewer
•
Updated
•
20k
•
56
RLHFlow/ultrafeedback_iter1
Viewer
•
Updated
•
20k
•
185
RLHFlow/pair-preference-Skywork-80K-v0.1
Viewer
•
Updated
•
82k
RLHFlow/ArmoRM-Multi-Objective-Data-v0.2
Viewer
•
Updated
•
555k
RLHFlow/ArmoRM-Multi-Objective-Data-v0.1
Viewer
•
Updated
•
569k
•
62
•
1
RLHFlow/pair_data_v2_80K_wsafety_short
Viewer
•
Updated
•
790k
RLHFlow/pair_data_v2_78_wo_safety
Viewer
•
Updated
•
777k
RLHFlow/pair_data_v2_80K_wsafety
Viewer
•
Updated
•
803k
•
38
•
1