view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC 3 days ago • 14
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 10 days ago • 78
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation Paper • 2503.16430 • Published 22 days ago • 34
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 116
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 73
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 190
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound Paper • 2502.05139 • Published Feb 7 • 1
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published Feb 18 • 41
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published Feb 17 • 41
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 151