SDXL-512 / README.md
aakashs's picture
Update README.md
c86bde1
|
raw
history blame
3.17 kB
metadata
license: openrail++
tags:
  - text-to-image
  - stable-diffusion

image/jpeg


Overview

SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.

Note: It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution.


Model Description

  • Developed by: Natural Synthetics Inc.
  • Model type: Diffusion-based text-to-image generative model
  • License: CreativeML Open RAIL++-M License
  • Model Description: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
  • Resources for more information: Check out our GitHub Repository.
  • Finetuned from model: Stable Diffusion XL 1.0

🧨 Diffusers

Make sure to upgrade diffusers to >= 0.18.2:

pip install diffusers --upgrade

In addition make sure to install transformers, safetensors, accelerate as well as the invisible watermark:

pip install invisible_watermark transformers accelerate safetensors

Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler in this example we are swapping it to EulerAncestralDiscreteScheduler:

import torch
from torch import autocast
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
model = "hotshotco/SDXL-512"
pipe = StableDiffusionXLPipeline.from_pretrained(
    model, 
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    variant="fp16"
    )
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
prompt = "a woman laughing"
negative_prompt = ""
image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=512,
    height=512,
    guidance_scale=12,
    target_size=(1024,1024),
    original_size=(4096,4096),
    num_inference_steps=50
    ).images[0]
image.save("woman_laughing.png")

Limitations and Bias

Limitations

  • The model does not achieve perfect photorealism
  • The model cannot render legible text
  • The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
  • Faces and people in general may not be generated properly.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.