The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 90
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 90
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published Jan 3 • 42
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 97
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 80
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published Sep 7, 2024 • 24
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 48
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20, 2024 • 42
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 67
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15, 2024 • 58
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 128