DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 173
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14 • 10
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1 • 29
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 28
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation Paper • 2404.03673 • Published Mar 25 • 14
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published 20 days ago • 27
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 18 days ago • 25