SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published 6 days ago • 14
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 10 days ago • 230
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published 10 days ago • 22
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 10 days ago • 83
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 22 days ago • 79
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published about 1 month ago • 19
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 74
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 78
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting Paper • 2411.17223 • Published Nov 26, 2024 • 5
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper • 2411.13807 • Published Nov 21, 2024 • 11
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 25
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 58
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 136
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6, 2024 • 44
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 37
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 118
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9, 2024 • 39
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31, 2024 • 22