LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published 5 days ago • 33
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper • 2407.08683 • Published 4 days ago • 14
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published 7 days ago • 19
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published 12 days ago • 85