Save it, later read
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 18 -
GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Paper • 2412.06089 • Published • 4 -
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Paper • 2412.05818 • Published -
FLAIR: VLM with Fine-grained Language-informed Image Representations
Paper • 2412.03561 • Published • 1