OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 6 days ago • 50
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 9 days ago • 18
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression Paper • 2412.17483 • Published 10 days ago • 29
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 10 days ago • 35
The Superposition of Diffusion Models Using the Itô Density Estimator Paper • 2412.17762 • Published 10 days ago • 12
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 18 days ago • 43
Large Concept Models: Language Modeling in a Sentence Representation Space Paper • 2412.08821 • Published 22 days ago • 11
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 11 days ago • 32
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 11 days ago • 39
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published 14 days ago • 69
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 13 days ago • 35
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Paper • 2412.13649 • Published 15 days ago • 19