diffusion-xl / DOCS.md
adamelliotfields's picture
Docs updates
dec8492 verified
|
raw
history blame
3.59 kB

Diffusion XL

TL;DR: Enter a prompt or roll the 🎲 and press Generate.

Prompting

Positive and negative prompts are embedded by Compel for weighting. See syntax features to learn more.

Use + or - to increase the weight of a token. The weight grows exponentially when chained. For example, blue+ means 1.1x more attention is given to blue, while blue++ means 1.1^2 more, and so on. The same applies to -.

For groups of tokens, wrap them in parentheses and multiply by a float between 0 and 2. For example, a (birthday cake)1.3 on a table will increase the weight of both birthday and cake by 1.3x. This also means the entire scene will be more birthday-like, not just the cake. To counteract this, you can use - inside the parentheses on specific tokens, e.g., a (birthday-- cake)1.3, to reduce the birthday aspect.

This is the same syntax used in InvokeAI and it differs from AUTOMATIC1111:

Compel AUTOMATIC1111
blue++ ((blue))
blue-- [[blue]]
(blue)1.2 (blue:1.2)
(blue)0.8 (blue:0.8)

Arrays

Arrays allow you to generate multiple different images from a single prompt. For example, an adult [[blonde,brunette]] [[man,woman]] will expand into 4 different prompts. This implementation was inspired by Fooocus.

NB: Make sure to set Images to the number of images you want to generate. Otherwise, only the first prompt will be used.

Models

Each model checkpoint has a different aesthetic:

Styles

Styles are prompt templates that wrap your positive and negative prompts. They were originally derived from the twri/sdxl_prompt_styler Comfy node, but have since been entirely rewritten.

Start by framing a simple subject like portrait of a young adult woman or landscape of a mountain range and experiment.

Scale

Rescale up to 4x using Real-ESRGAN with weights from ai-forever. Necessary for high-resolution images.

Advanced

DeepCache

DeepCache caches lower UNet layers and reuses them every Interval steps. Trade quality for speed:

  • 1: no caching (default)
  • 2: more quality
  • 3: balanced
  • 4: more speed

Refiner

Use the ensemble of expert denoisers technique, where the first 80% of timesteps are denoised by the base model and the remaining 80% by the refiner. Not available with image-to-image pipelines.