hotshotco
/

SDXL-512

StableDiffusionXLPipeline

stable-diffusion

Inference Endpoints

Model card Files Files and versions Community

SDXL-512 / README.md

johnmullan's picture

Update README.md

95b7c0d about 1 year ago

|

history blame contribute delete

3.03 kB

	---
	license: openrail++
	tags:
	- text-to-image
	- stable-diffusion
	---

	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/FAHjxgN2tk6uXmQAUeFI5.jpeg)

	<hr>

	# Overview
	SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.

	Note: It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution.

	- Use it with [Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) (recommended)

	<hr>

	# Model Description
	- Developed by: Natural Synthetics Inc.
	- Model type: Diffusion-based text-to-image generative model
	- License: CreativeML Open RAIL++-M License
	- Model Description: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
	- Resources for more information: Check out our [GitHub Repository](https://github.com/hotshotco/Hotshot-XL).
	- Finetuned from model: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

	<hr>

	# 🧨 Diffusers

	Make sure to upgrade diffusers to >= 0.18.2:
	```
	pip install diffusers --upgrade
	```

	In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
	```
	pip install invisible_watermark transformers accelerate safetensors
	```

	Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler in this example we are swapping it to EulerAncestralDiscreteScheduler:
	```py
	from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

	pipe = StableDiffusionXLPipeline.from_pretrained(
	"hotshotco/SDXL-512",
	use_safetensors=True,
	).to('cuda')

	pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

	prompt = "a woman laughing"
	negative_prompt = ""

	image = pipe(
	prompt,
	negative_prompt=negative_prompt,
	width=512,
	height=512,
	target_size=(1024, 1024),
	original_size=(4096, 4096),
	num_inference_steps=50
	).images[0]

	image.save("woman_laughing.png")
	```

	<hr>

	# Limitations and Bias
	## Limitations
	- The model does not achieve perfect photorealism
	- The model cannot render legible text
	- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
	- Faces and people in general may not be generated properly.

	## Bias
	While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.