Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset Paper • 2412.02595 • Published Dec 3, 2024 • 1
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published 22 days ago • 3
view article Article Optimizing Pretraining Data Mixes with LLM-Estimated Utility By WillHeld • Jan 22 • 3
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise Paper • 2410.03017 • Published Oct 3, 2024 • 27
Distilling an End-to-End Voice Assistant Without Instruction Training Data Paper • 2410.02678 • Published Oct 3, 2024 • 23