COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 15 days ago • 44
meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text • Updated 13 days ago • 737k • • 811
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 25 days ago • 43
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 27