Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 9 days ago • 54
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 22 days ago • 76
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting Paper • 2411.17223 • Published 22 days ago • 5
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting Paper • 2411.17223 • Published 22 days ago • 5 • 3
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper • 2411.13807 • Published 27 days ago • 11
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13 • 25
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 55
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 135
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6 • 43
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 37