VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation Paper • 2504.04060 • Published 18 days ago • 2
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 51