DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published May 14 • 12
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces Paper • 2407.11895 • Published Jul 16 • 7
Improving 2D Feature Representations by 3D-Aware Fine-Tuning Paper • 2407.20229 • Published Jul 29 • 7
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Paper • 2408.00690 • Published Aug 1 • 23
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 58
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published Aug 22 • 50
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models Paper • 2408.11318 • Published Aug 21 • 55