--- license: openrail++ tags: - text-to-image - stable-diffusion --- ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/ux_sZKB9snVPsKRT1TzfG.gif)
# Overview SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution. - **Use it with [Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) (recommended)**
# Model Description - **Developed by**: Natural Synthetics Inc. - **Model type**: Diffusion-based text-to-image generative model - **License**: CreativeML Open RAIL++-M License - **Model Description**: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution. - **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/hotshot-xl). - **Finetuned from model**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
# 🧨 Diffusers Make sure to upgrade diffusers to >= 0.18.2: ``` pip install diffusers --upgrade ``` In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark: ``` pip install invisible_watermark transformers accelerate safetensors ``` Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**: ```py import torch from torch import autocast from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler model = "hotshotco/SDXL-512" pipe = StableDiffusionXLPipeline.from_pretrained( model, torch_dtype=torch.float16, use_safetensors=True, variant="fp16" ) pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) pipe.to('cuda') prompt = "a woman laughing" negative_prompt = "" image = pipe( prompt, negative_prompt=negative_prompt, width=512, height=512, guidance_scale=12, target_size=(1024,1024), original_size=(4096,4096), num_inference_steps=50 ).images[0] image.save("woman_laughing.png") ```
# Limitations and Bias ## Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. ## Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.