LMD+ Model Card

LMD and LMD+ greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. It improves spatial reasoning, the understanding of negation, attribute binding, generative numeracy, etc. in a unified manner without explicitly aiming for each. LMD is completely training-free (i.e., uses SD model off-the-shelf). LMD+ takes in additional adapters for better control. This is a reproduction of LMD+ model used in our work. Our full codebase is at here.

This LMD+ model is based on Stable Diffusion v1.4 and integrates the adapters trained with GLIGEN. The model can be directly used with our LLMGroundedDiffusionPipeline, which is a simplified pipeline of LMD+ without per-box generation.

See the original SD Model Card here.

Cite our work

@article{lian2023llmgrounded,
    title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models}, 
    author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
    journal={arXiv preprint arXiv:2305.13655},
    year={2023}
}