Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 29 days ago • 97 • 4
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published Feb 11 • 36 • 4
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks Paper • 2210.14712 • Published Oct 26, 2022
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 113
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Paper • 2406.19680 • Published Jun 28, 2024 • 1 • 1
AescF/hubert-base-ls960-finetuned-common_language Audio Classification • Updated Sep 26, 2023 • 66 • 1
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Paper • 2408.07547 • Published Aug 14, 2024 • 8 • 3
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP Paper • 2408.04303 • Published Aug 8, 2024 • 21
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Paper • 2407.17952 • Published Jul 25, 2024 • 33 • 7