MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 11 days ago • 107
WavTokenizer-Medium-Large Collection https://arxiv.org/abs/2408.16532 • 4 items • Updated Feb 25 • 11
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 51
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6 • 69
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published Dec 30, 2024 • 19
view article Article Transformers.js v3: WebGPU support, new models & tasks, and more… Oct 22, 2024 • 72
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated Mar 3 • 113
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 47
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61