VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published 11 days ago • 10
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23 • 22
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28 • 20
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 32
3D-GPT: Procedural 3D Modeling with Large Language Models Paper • 2310.12945 • Published Oct 19, 2023 • 57
Brain2Music: Reconstructing Music from Human Brain Activity Paper • 2307.11078 • Published Jul 20, 2023 • 40
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution Paper • 2306.15794 • Published Jun 27, 2023 • 17
Language-Guided Music Recommendation for Video via Prompt Analogies Paper • 2306.09327 • Published Jun 15, 2023 • 8