Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 4 days ago • 90
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published 13 days ago • 68
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 19 days ago • 177
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound Paper • 2502.05139 • Published Feb 7 • 1
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 20 days ago • 37
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 21 days ago • 76
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published 21 days ago • 37
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 23 days ago • 141
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 26 days ago • 143
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Paper • 2406.04904 • Published Jun 7, 2024 • 9
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Paper • 2502.05512 • Published about 1 month ago • 2
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published Feb 6 • 24
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 27 days ago • 49
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks Paper • 2502.04465 • Published Feb 6 • 3