|
--- |
|
license: openrail++ |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
--- |
|
|
|
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/FAHjxgN2tk6uXmQAUeFI5.jpeg) |
|
|
|
<hr> |
|
|
|
# Overview |
|
SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution. |
|
|
|
*Note:* It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution. |
|
|
|
- **Use it with [Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) (recommended)** |
|
|
|
<hr> |
|
|
|
# Model Description |
|
- **Developed by**: Natural Synthetics Inc. |
|
- **Model type**: Diffusion-based text-to-image generative model |
|
- **License**: CreativeML Open RAIL++-M License |
|
- **Model Description**: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution. |
|
- **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/Hotshot-XL). |
|
- **Finetuned from model**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) |
|
|
|
<hr> |
|
|
|
# 🧨 Diffusers |
|
|
|
Make sure to upgrade diffusers to >= 0.18.2: |
|
``` |
|
pip install diffusers --upgrade |
|
``` |
|
|
|
In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark: |
|
``` |
|
pip install invisible_watermark transformers accelerate safetensors |
|
``` |
|
|
|
Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**: |
|
```py |
|
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler |
|
|
|
pipe = StableDiffusionXLPipeline.from_pretrained( |
|
"hotshotco/SDXL-512", |
|
use_safetensors=True, |
|
).to('cuda') |
|
|
|
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) |
|
|
|
prompt = "a woman laughing" |
|
negative_prompt = "" |
|
|
|
image = pipe( |
|
prompt, |
|
negative_prompt=negative_prompt, |
|
width=512, |
|
height=512, |
|
target_size=(1024, 1024), |
|
original_size=(4096, 4096), |
|
num_inference_steps=50 |
|
).images[0] |
|
|
|
image.save("woman_laughing.png") |
|
``` |
|
|
|
<hr> |
|
|
|
# Limitations and Bias |
|
## Limitations |
|
- The model does not achieve perfect photorealism |
|
- The model cannot render legible text |
|
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” |
|
- Faces and people in general may not be generated properly. |
|
|
|
## Bias |
|
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. |
|
|