--- license: creativeml-openrail-m tags: - lmd - llm-grounded-diffusion - lmd-plus - layout-to-image - text-to-layout - text-to-layout-to-image - llm - stable-diffusion - stable-diffusion-diffusers - text-to-image widget: - text: "In an indoor scene, a blue cube directly above a red cube with a vase on the left of them" output: url: examples/example1.png extra_gated_prompt: |- This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies: 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content 2. The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully) Please read the full license carefully here: https://huggingface.co/spaces/CompVis/stable-diffusion-license extra_gated_heading: Please read the LICENSE to access this model --- # LMD+ Model Card [Paper](https://arxiv.org/pdf/2305.13655.pdf) | [Project Page](https://llm-grounded-diffusion.github.io/) | [**5-minute Blog Post**](https://bair.berkeley.edu/blog/2023/05/23/lmd/) | [**Demo**](https://huggingface.co/spaces/longlian/llm-grounded-diffusion) | [Code](https://github.com/TonyLianLong/LLM-groundedDiffusion) | [Citation](https://github.com/TonyLianLong/LLM-groundedDiffusion#citation) | [Related work: LLM-grounded Video Diffusion Models](https://llm-grounded-video-diffusion.github.io/) LMD and LMD+ greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. It improves spatial reasoning, the understanding of negation, attribute binding, generative numeracy, etc. in a unified manner without explicitly aiming for each. LMD is completely training-free (i.e., uses SD model off-the-shelf). LMD+ takes in additional adapters for better control. This is a reproduction of LMD+ model used in our work. Our full codebase is at [here](https://github.com/TonyLianLong/LLM-groundedDiffusion). This LMD+ model is based on [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and integrates the adapters trained with [GLIGEN](https://huggingface.co/gligen/diffusers-inpainting-text-box). The model can be directly used with our `LLMGroundedDiffusionPipeline`, which is a simplified pipeline of LMD+ without per-box generation. See the original SD Model Card [here](https://huggingface.co/CompVis/stable-diffusion-v1-4). ## Cite our work ``` @article{lian2023llmgrounded, title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models}, author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor}, journal={arXiv preprint arXiv:2305.13655}, year={2023} } ```