We collect the open-source datasets and process them into the standard format.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
5
RLHFlow/DPA-v1-Mistral-7B
Text Generation
•
Updated
•
23
•
1
RLHFlow/RewardModel-Mistral-7B-for-DPA-v1
Text Classification
•
Updated
•
102
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
1.85k
•
17
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
248
•
5
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
1.65k
•
1
datasets
20
RLHFlow/test_generation_2k
Viewer
•
Updated
•
383
RLHFlow/SHP-standard
Viewer
•
Updated
•
5
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard
Viewer
•
Updated
•
31
RLHFlow/prompt-collection-v0.1
Viewer
•
Updated
•
7
•
3
RLHFlow/pair-preference-dataset-mix1
Viewer
•
Updated
RLHFlow/Prometheus2-preference-standard
Viewer
•
Updated
•
1
•
1
RLHFlow/iterative-prompt-v1-iter3-20K
Viewer
•
Updated
•
21
RLHFlow/iterative-prompt-v1-iter2-20K
Viewer
•
Updated
•
34
RLHFlow/iterative-prompt-v1-iter1-20K
Viewer
•
Updated
•
267
RLHFlow/Argilla-Math-DPO-standard
Viewer
•
Updated
•
17