stabilityai/stable-diffusion-xl-base-1.0 · Modifying and controling the size and ratio of an image.

Hello community,

I'm attempting to acquire images with different aspect ratios such as 16:9, 4:3, 3:5, etc. By default, the output is always a 1:1 ratio image, and despite my efforts to adjust the parameters for variance, I haven't succeeded. Below are the parameters I've considered. How can I configure them correctly?

#️⃣ Numeric parameters
strength = 0.3
num_inference_steps = 50
denoising_start = 0.0
denoising_end = 1.0
guidance_scale = 7.5
num_images_per_prompt = 1
eta = 0.0
guidance_rescale = 0.0
aesthetic_score = 6.0
negative_aesthetic_score = 2.5
clip_skip = 0

#️⃣ String parameters
prompt = "Something" # A valid prompt must be provided
prompt_2 = prompt # Uses the value of prompt if not defined
negative_prompt = "Something" # None unless using negative guidance
negative_prompt_2 = negative_prompt # Uses the value of negative_prompt if not defined
output_type = "pil"
target_size = (1280, 720)
negative_target_size = (3840, 2160)

#️⃣ Parameters set to match target size as default
original_size = (1024, 768) # We don't need it as we're not cropping from an existing image.
crops_coords_top_left = (0, 0) # We don't need it as we're not cropping from an existing image.
negative_original_size = (1024, 1024) # We don't need it as we're not cropping from an existing image.
negative_crops_coords_top_left = (0, 0) # We don't need it as we're not cropping from an existing image.

Thank you very much for your guidance. To provide some context, these parameters are defined in a cell that I run to subsequently execute another cell, allowing for easy modification of the parameters if desired. Below is the subsequent cell:

#️⃣ Running the model
pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch_dtype)
pipe = pipe.to("cuda")
pipe.safety_checker = None

generated_image = pipe(
‎ ‎ ‎‎‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ prompt=prompt,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ prompt_2=prompt_2,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ strength=strength,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ num_inference_steps=num_inference_steps,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ denoising_start=denoising_start,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎‎ ‎ denoising_end=denoising_end,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ guidance_scale=guidance_scale,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ negative_prompt=negative_prompt,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ negative_prompt_2=negative_prompt_2,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ num_images_per_prompt=num_images_per_prompt,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ eta=eta,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ output_type=output_type,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ guidance_rescale=guidance_rescale,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ original_size=original_size,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ crops_coords_top_left=crops_coords_top_left,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ target_size=target_size,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ negative_original_size=negative_original_size,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ negative_crops_coords_top_left=negative_crops_coords_top_left,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ negative_target_size=negative_target_size,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ aesthetic_score=aesthetic_score,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ negative_aesthetic_score=negative_aesthetic_score,
‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ clip_skip=clip_skip
).images[0]

#️⃣ Display the image
display(generated_image)