image to image version

#3
by Guytron - opened

I'd like to help work on an image-to-image version but I'm not even sure that is possible with this type of model. Anyone got pointers where I could start helping or if this is even possible?

img2img is alaways possible in comfy, you just have to give it other image instead of empty latent image and lower denoise

I am also interested in img2img for AuraFlow. Not using Comfy though. Using the diffusers code.
In previous scripts using diffusers (for example Playground) when loading the model for img2img you use this syntax. (I changed the model to fal/AuraFlow)

pipe = AutoPipelineForImage2Image.from_pretrained(
f"fal/AuraFlow",
torch_dtype=torch.float16,
use_safetensors=True,
add_watermarker=False,
variant="fp16"
)
pipe.to("cuda")

Once the script gets to that call I get these errors

File "D:\Tests\AuraFlow\voc_auraflow\lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "D:\Tests\AuraFlow\voc_auraflow\lib\site-packages\diffusers\pipelines\auto_pipeline.py", line 653, in from_pretrained
image_2_image_cls = _get_task_class(AUTO_IMAGE2IMAGE_PIPELINES_MAPPING, orig_class_name)
File "D:\Tests\AuraFlow\voc_auraflow\lib\site-packages\diffusers\pipelines\auto_pipeline.py", line 189, in _get_task_class
raise ValueError(f"AutoPipeline can't find a pipeline linked to {pipeline_class_name} for {model_name}")
ValueError: AutoPipeline can't find a pipeline linked to AuraFlowPipeline for None

If someone can help get init/seed images working with AuraFlow or knows the correct syntax please let me know. I want to take an init image and modify it with a prompt.
For example this using the image of scarlett to seed the prompt for a clown...
scarlett.png
scarlett_v2_0.5_output.png
That is again using Playground as an example, but hoping AuraFlow can do the same thing?

Img2Img generation/editing is not the same as Img2Img inversion. Standard Img2Img methods, like Diffedit or Instructpix2pix, primarily rely on a forward pass, no inversion! However, many advanced Img2Img editing techniques, such as LEdits++, depend on first inverting the original image into noise by reversing the scheduler. This inversion process appears to be non-trivial for rectified flow models. If anyone has suggestions, lmk!

Img2Img generation/editing is not the same as Img2Img inversion. Standard Img2Img methods, like Diffedit or Instructpix2pix, primarily rely on a forward pass, no inversion! However, many advanced Img2Img editing techniques, such as LEdits++, depend on first inverting the original image into noise by reversing the scheduler. This inversion process appears to be non-trivial for rectified flow models. If anyone has suggestions, lmk!

Is that speculation or you know it for a fact that there is no way to edit images like my above examples with AuraFlow?
If it is not possible then I can ignore it and move on. If it is possible I just need the right syntax to support it.
Thanks.

Is that speculation or you know it for a fact that there is no way to edit images like my above examples with AuraFlow?
If it is not possible then I can ignore it and move on. If it is possible I just need the right syntax to support it.
Thanks.

:D did you read what I wrote? I never said there is no way. I just said, copy and paste didn't work for me due to the nature of rectified flow. Inversion works perfectly. But editing will move the noise estimates calculated by inversion away from the optimal trajectory and hence result in garbage. I'm pretty sure there will be a way around, it's just not straightforward.

OK, hopefully someone else knows or can get it working.

img2img is alaways possible in comfy, you just have to give it other image instead of empty latent image and lower denoise

I don't use comfy, I write my own code directly from repos or use diffusers if I'm being lazy. I'm restricted to renting GPU power so I have to do that as efficiently and cheaply as I can. Also, the scripts are for exploring latent space in animation form which text-to-image isn't nearly as good at as image-to-image feed-forward is.

Yes, I want to use this code outside Comfy. I want to add support for AuraFlow to Visions of Chaos. For this purpose diffusers is ideal. I have used diffusers for SD, PlayGround and others in the past just fine.
To get AuraFlow working the diffusers issue is the main showstopper
https://huggingface.co/fal/AuraFlow/discussions/9
but having the init seed images working would allow me to use AuraFlow for recursive movies like this one
https://www.instagram.com/p/C8NeAODsFA-/

unfortunately work was stopped by Fal on the diffusers codebase stuff, so if something isn't supported right now, it might be a while before it comes

Please consider getting the Diffusers code working with the next verison (assuming you are training the v2 model) and supporting image to image.

Just in case, here are the pip commands I am using to create the virtual environment

python -m pip install --upgrade pip
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts wheel==0.43.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts git+https://github.com/huggingface/diffusers.git@3f1411767bc0f1837adb6f289713807f18599db3
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts transformers==4.42.4
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts accelerate==0.32.1
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts protobuf==5.27.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts sentencepiece==0.2.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts opencv_python==4.10.0.84
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts scipy==1.14.0
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.3.0+cu118 torchvision --index-url https://download.pytorch.org/whl/cu118
pip uninstall -y charset-normalizer
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts charset-normalizer==3.3.0

and here is the test script

import sys
import os
import datetime
from diffusers import DiffusionPipeline
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid
import torch
import argparse
import numpy as np
import cv2
import PIL
from PIL import Image, ImageEnhance
from scipy.ndimage.filters import median_filter

sys.stdout.write("Parsing arguments ...\n")
sys.stdout.flush()

sys.stdout.write("Setting up init image pipeline ...\n")
sys.stdout.flush()

pipe = AutoPipelineForImage2Image.from_pretrained(
f"fal/AuraFlow",
torch_dtype=torch.float16,
use_safetensors=True,
add_watermarker=False,
variant="fp16"
)
pipe.to("cuda")

init_image = load_image("scarlett.png")

sys.stdout.write("Generating image ...\n")
sys.stdout.flush()

image = pipe(
prompt="a portrait of a clown",
negative_prompt="",
guidance_scale=3.0,
width=1024,
height=1024,
safety_checker=False,
image=init_image,
strength=0.3,
num_inference_steps=50
).images[0]

sys.stdout.write("Saving image ...\n")
sys.stdout.flush()

image.save("scarlett_v2_0.3_output.png")

image = pipe(
prompt="a portrait of a clown",
negative_prompt="",
guidance_scale=3.0,
width=1024,
height=1024,
safety_checker=False,
image=init_image,
strength=0.5,
num_inference_steps=50
).images[0]

sys.stdout.write("Saving image ...\n")
sys.stdout.flush()

image.save("scarlett_v2_0.5_output.png")

image = pipe(
prompt="a portrait of a clown",
negative_prompt="",
guidance_scale=3.0,
width=1024,
height=1024,
safety_checker=False,
image=init_image,
strength=0.7,
num_inference_steps=50
).images[0]

sys.stdout.write("Saving image ...\n")
sys.stdout.flush()

image.save("scarlett_v2_0.7_output.png")

sys.stdout.write("Done\n")
sys.stdout.flush()

Sign up or log in to comment