CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29 • 56
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 32
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Paper • 2406.09162 • Published Jun 13 • 13