The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 4 days ago • 65
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 11 days ago • 46
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 88
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published Dec 11, 2024 • 52
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published Dec 11, 2024 • 45
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 126
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7, 2024 • 37
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation Paper • 2411.17945 • Published Nov 26, 2024 • 23
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 89
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published Oct 21, 2024 • 17
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5, 2024 • 57
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 5 days ago • 546
NVLM 1.0 Collection A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks. • 2 items • Updated 3 days ago • 51