This is a collection of datasets and models of process reward modeling.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
13
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
•
1.45k
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
•
1
RLHFlow/Llama3-SFT-v2.0-epoch3
Text Generation
•
Updated
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
8.44k
•
7
RLHFlow/Llama3.1-8B-PRM
Text Generation
•
Updated
•
8
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
1.79k
•
36
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
6.53k
•
41
RLHFlow/LLaMA3.2-3B-SFT
Text Generation
•
Updated
•
10
RLHFlow/LLaMA3.2-1B-SFT
Text Generation
•
Updated
•
13
datasets
53
RLHFlow/Deepseek-ORM-Data-Pairwise
Viewer
•
Updated
•
54.5k
RLHFlow/Mistral-ORM-Data-Pairwise
Viewer
•
Updated
•
37.9k
RLHFlow/Deepseek-GSM8K-Test
Viewer
•
Updated
•
1.32k
RLHFlow/Deepseek-ORM-Data
Viewer
•
Updated
•
15k
RLHFlow/Mistral-ORM-Data
Viewer
•
Updated
•
15k
RLHFlow/Deepseek-MATH500-Test
Viewer
•
Updated
•
500
•
2
RLHFlow/RLHFlow-SFT-Dataset-ver2
Viewer
•
Updated
•
2.32M
•
6
•
1
RLHFlow/Mistral-GSM8K-Test
Viewer
•
Updated
•
1.32k
•
6
RLHFlow/Mistral-MATH500-Test
Viewer
•
Updated
•
500
•
5
RLHFlow/ultrafeedback_all
Viewer
•
Updated
•
59.6k
•
7