license: apache-2.0
datasets:
- lmms-lab/COCO-Caption2017
language:
- en
base_model:
- stable-diffusion-v1-5/stable-diffusion-v1-5
pipeline_tag: text-to-image
Color Diffusion (Evaluating Model Perception of Color Illusions in Photorealistic Scenes)
Authors: Lingjun Mao, Zineng Tang, Alane Suhr
Model Overview
The Color Diffusion model used in the paper "Evaluating Model Perception of Color Illusions in Photorealistic Scenes" is designed to generate images for RCID dataset based on a color sketch. By simply providing the model with a colored draft image, it can generate realistic images that match both the shape and color patterns of the provided sketch, according to a given text prompt. This model is built upon ControlNet and has been trained for 20 epochs on the MS COCO 2017 dataset.
RCID Dataset
The construction of our dataset involves three steps:
Image Generation. For contrast and stripe illusions, we use procedural code to generate simple illusion images, which are then processed by our Color Diffusion model to create realistic illusion images. For filter illusions, we directly apply contrasting color filters to the original images. Each type of illusion also includes a corresponding control group without any illusions for comparison.
Question Generation. We use GPT-4o to generate image-specific questions that are designed to evaluate the model's understanding of the illusion.
Human Feedback. We collect human participants' feedback on these images and adjust the original classification of “illusion” and “non-illusion” based on whether participants are deceived.
Our data can be found in the following link: RCID Dataset
The code is released on Color Illusion
How to Use the Model
To generate a realistic image from a simplified image and a text prompt using the Color Diffusion model, you can use the following code:
import random
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the models
controlnet = ControlNetModel.from_pretrained("controlnet_model_path", torch_dtype=torch.float32).to(device)
pipe = StableDiffusionControlNetPipeline.from_pretrained("base_model_path", controlnet=controlnet, torch_dtype=torch.float32).to(device)
# Load your simplified image
simplified_image = load_image("path_to_simplified_image.png")
# Define the text prompt
prompt = "A photorealistic image of a sunset over the ocean."
# Generate realistic image
generator = torch.manual_seed(random.randint(0, 100000))
generated_image = pipe(prompt, num_inference_steps=50, generator=generator, image=simplified_image).images[0]
# Save the generated image
generated_image.save("generated_image.png")
License
The source code of this repository is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages.
For more information, access to the dataset, and to contribute, please visit our Website.