Overview
SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.
Note: It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution.
- Use it with Hotshot-XL (recommended)
Model Description
- Developed by: Natural Synthetics Inc.
- Model type: Diffusion-based text-to-image generative model
- License: CreativeML Open RAIL++-M License
- Model Description: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
- Resources for more information: Check out our GitHub Repository.
- Finetuned from model: Stable Diffusion XL 1.0
𧨠Diffusers
Make sure to upgrade diffusers to >= 0.18.2:
pip install diffusers --upgrade
In addition make sure to install transformers
, safetensors
, accelerate
as well as the invisible watermark:
pip install invisible_watermark transformers accelerate safetensors
Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler in this example we are swapping it to EulerAncestralDiscreteScheduler:
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
pipe = StableDiffusionXLPipeline.from_pretrained(
"hotshotco/SDXL-512",
use_safetensors=True,
).to('cuda')
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
prompt = "a woman laughing"
negative_prompt = ""
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=512,
height=512,
target_size=(1024, 1024),
original_size=(4096, 4096),
num_inference_steps=50
).images[0]
image.save("woman_laughing.png")
Limitations and Bias
Limitations
- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to βA red cube on top of a blue sphereβ
- Faces and people in general may not be generated properly.
Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
- Downloads last month
- 338