Modifying and controling the size and ratio of an image.

#169
by PoltorProgrammer - opened

Hello community,

I'm attempting to acquire images with different aspect ratios such as 16:9, 4:3, 3:5, etc. By default, the output is always a 1:1 ratio image, and despite my efforts to adjust the parameters for variance, I haven't succeeded. Below are the parameters I've considered. How can I configure them correctly?


#️⃣ Numeric parameters
strength = 0.3
num_inference_steps = 50
denoising_start = 0.0
denoising_end = 1.0
guidance_scale = 7.5
num_images_per_prompt = 1
eta = 0.0
guidance_rescale = 0.0
aesthetic_score = 6.0
negative_aesthetic_score = 2.5
clip_skip = 0

#️⃣ String parameters
prompt = "Something" # A valid prompt must be provided
prompt_2 = prompt # Uses the value of prompt if not defined
negative_prompt = "Something" # None unless using negative guidance
negative_prompt_2 = negative_prompt # Uses the value of negative_prompt if not defined
output_type = "pil"
target_size = (1280, 720)
negative_target_size = (3840, 2160)

#️⃣ Parameters set to match target size as default
original_size = (1024, 768) # We don't need it as we're not cropping from an existing image.
crops_coords_top_left = (0, 0) # We don't need it as we're not cropping from an existing image.
negative_original_size = (1024, 1024) # We don't need it as we're not cropping from an existing image.
negative_crops_coords_top_left = (0, 0) # We don't need it as we're not cropping from an existing image.


Thank you very much for your guidance. To provide some context, these parameters are defined in a cell that I run to subsequently execute another cell, allowing for easy modification of the parameters if desired. Below is the subsequent cell:


#️⃣ Running the model
pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch_dtype)
pipe = pipe.to("cuda")
pipe.safety_checker = None

generated_image = pipe(
β€Ž β€Ž β€Žβ€Žβ€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž prompt=prompt,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž prompt_2=prompt_2,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž strength=strength,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž num_inference_steps=num_inference_steps,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž denoising_start=denoising_start,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Žβ€Ž β€Ž denoising_end=denoising_end,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž guidance_scale=guidance_scale,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž negative_prompt=negative_prompt,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž negative_prompt_2=negative_prompt_2,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž num_images_per_prompt=num_images_per_prompt,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž eta=eta,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž output_type=output_type,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž guidance_rescale=guidance_rescale,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž original_size=original_size,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž crops_coords_top_left=crops_coords_top_left,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž target_size=target_size,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž negative_original_size=negative_original_size,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž negative_crops_coords_top_left=negative_crops_coords_top_left,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž negative_target_size=negative_target_size,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž aesthetic_score=aesthetic_score,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž negative_aesthetic_score=negative_aesthetic_score,
β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž β€Ž clip_skip=clip_skip
).images[0]

#️⃣ Display the image
display(generated_image)


Sign up or log in to comment