Automating style checking

#5
by ReggieMontoya - opened

HUGE time saver here. I wrote a simple program (GPT helped!) that will take an input of an artist list and create a real life artist card that can be used to determine if a model knows the artist. Combine this with another program that stitches an AI artist card to it and you have a quick comparison to see if the AI gets the real life style right. Just replace the output directory at the bottom and make a list of artists you want to query.

I put an error escape clause in there because sometimes it would just randomly fail and I didn't care to fix the problem. Meh, just re-run it until it works!

# -*- coding: utf-8 -*-
"""
Created on Wed Feb 28 17:43:16 2024



@author
	: Reggie Montoya
"""



import os
import os.path
from duckduckgo_search import DDGS
from fastcore.all import *
from io import BytesIO
import time
import requests
from PIL import Image
import numpy as np


def search_images(term, max_images=30):
    print(f"Searching for '{term}'")
    with DDGS() as ddgs:
        # generator which yields dicts with:
        # {'title','image','thumbnail','url','height','width','source'}
        search_results = ddgs.images(keywords=term)       
        # grap number of max_images urls
        image_urls = [next(search_results).get("image") for _ in range(max_images)]
        # convert to L (functionally extended list class from fastai)
        return L(image_urls)


def download_image(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return Image.open(BytesIO(response.content))
        else:
            print(f"Failed to download image from {url}")
    except Exception as e:
        print(f"Error downloading image: {e}")

def crop_to_square(image):
    width, height = image.size
    size = min(width, height)
    left = (width - size) / 2
    top = (height - size) / 2
    right = (width + size) / 2
    bottom = (height + size) / 2
    return image.crop((left, top, right, bottom))

def resize_image(image, size):
    return image.resize((size, size))


def make_artist_tile(term, urls, size):
# downloads first 4 image results and makes an artist tile image
    images = []

    # Download images
    for url in urls:
        image = download_image(url)
        if image:
            images.append(image)
        if len(images) == 4:
            break
    
    # discard all but the first 4 images
    images = images[0:4]
    if len(images)<4:
        print(f"failed to get 4 images for {term}")
        return
    
    # Crop images to square and resize
    for i, image in enumerate(images):
        images[i] = resize_image(crop_to_square(image), size).convert(mode="RGB")

    # Tile images into a 2x2 grid
    rows = []
    try:
        for i in range(0, len(images), 2):
            row = np.hstack(images[i:i+2])
            rows.append(row)
    except:
        print(f"failed to concatenate images for {term} for some reason I dunno...")
        return

    # Create the final image
    final_image = np.vstack(rows)

    # Save the final image
    final_image = Image.fromarray(final_image)
    final_image.save(f"{term}.jpg")

# Change the current working directory to the specified path
os.chdir("C:/Users/MAINUSER/AI/artist fidelity/web results")
artists = ["Butcher Billy", "Caia Koopman", "Callie Fink", "Camilla d'Errico", "Camille Corot", "Camille Pissarro", "Carl Kleiner", "Carlos Cruz Diez", "Carne Griffiths", "Carolina Herrera", "Carsten Holler"]

for artist in artists:
    if os.path.isfile(f"{artist}.jpg"):
        continue
    img_urls = search_images(f"{artist} art", max_images=50)
    make_artist_tile(artist, img_urls, 512)
    time.sleep(0.5)    

That's amazing, I didn't know duckduckgo had an api for fetching images. That could be handy for many things.

This will help especially with the situation where, with some non-English artists names, SDXL just makes up a style that matches the country that name is similar to. For example "by Quan Yi" gives the style of an ancient Chinese painting, which looks nice, but doesn't match the artist's actual style.

I am having it work on basically every artist in my master list, >6k. Doing a 2x2x2 style comparison. The SD part is the rate limiter now, only up to ~400 names. I'll run it while at work for the next few days.

I'll put the code for merging the images and the Comfy workflow I am using here, too. For now, I will leave you with this hilarious gem as a result, and a good example of contamination of the "real" idea:

John Bolton_concatenated.jpg

I have nearly 2000 artist comparison cards done and I gotta say, there are more close-to-real style artists in there than I thought.

As promised, here is the python code to take the real web result image and mash it together with the SD 2x2 grid:

# -*- coding: utf-8 -*-
"""
Created on Thu Feb 29 22:20:00 2024



@author
	: Reggie Montoya
"""
    
# take composite from the web (made with download DDG results.py)
#  and merge them with SDXL (or whatever other model) outputs
#  into a composite image that can be compared manually to see how well 
#  the model style gets the real style right. 
    
import os
from PIL import Image

def concat_images(input_folder1, input_folder2, output_folder):
    # Ensure output folder exists
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Get a list of existing files in the output folder
    existing_files = os.listdir(output_folder)

    # Loop through images in the first folder
    for filename in os.listdir(input_folder1):
        if filename.endswith(('.jpg', '.jpeg', '.png')):
            # Check if output folder already contains a file with the same name
            output_filename = os.path.splitext(filename)[0] + "_concatenated.jpg"
            if output_filename in existing_files:
                print(f"File {output_filename} already exists in the output folder. Skipping.")
                continue

            # Load first image
            img1 = Image.open(os.path.join(input_folder1, filename))

            # Check if there's a matching image in the second folder
            filename2 = filename.replace(".jpg", "_00001.jpg")
            matching_filepath = input_folder2 + filename2
            if filename2 in os.listdir(input_folder2):
                # Load second image
                img2 = Image.open(matching_filepath)

                # Resize second image to match the height of the first image
                img2 = img2.resize((int(img2.width * img1.height / img2.height), img1.height))

                # Concatenate horizontally
                concat_img = Image.new('RGB', (img1.width + img2.width, img1.height))
                concat_img.paste(img1, (0, 0))
                concat_img.paste(img2, (img1.width, 0))

                # Save concatenated image to output folder
                concat_img.save(os.path.join(output_folder, output_filename))
            else:
                print(f"No matching image found for {filename}.")

    print("Concatenation complete.")

# Now run it with 2 input folders and 1 output folder specified
input_folder1 = "C:\\Users\\MAINUSER\\AI\\artist fidelity\\web results\\"
input_folder2 = "C:\\Users\\MAINUSER\\AI\\artist fidelity\\SDXL results\\"
output_folder = "C:\\Users\\MAINUSER\\AI\\artist fidelity\\web results\\merged\\"
concat_images(input_folder1, input_folder2, output_folder)

And here's my ComfyUI workflow (save it as a .json in your workflow folder to import it)

{"last_node_id": 491, "last_link_id": 678, "nodes": [{"id": 357, "type": "Reroute (rgthree)", "pos": {"0": 530, "1": 750, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}, "size": [40, 30], "flags": {}, "order": 9, "mode": 0, "inputs": [{"name": "", "type": "*", "link": 678, "dir": 3, "has_old_label": true, "label": " ", "widget": {"name": "value"}}], "outputs": [{"name": "STRING", "type": "STRING", "links": [427, 472], "dir": 4, "has_old_label": true, "label": " ", "slot_index": 0}], "properties": {"resizable": false, "size": [40, 30]}}, {"id": 351, "type": "Reroute (rgthree)", "pos": {"0": 650, "1": 1260, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}, "size": [40, 30], "flags": {}, "order": 12, "mode": 0, "inputs": [{"name": "", "type": "*", "link": 472, "dir": 3, "has_old_label": true, "label": " ", "slot_index": 0, "widget": {"name": "value"}}], "outputs": [{"name": "STRING", "type": "STRING", "links": [489, 668], "dir": 4, "has_old_label": true, "label": " ", "slot_index": 0}], "properties": {"resizable": false, "size": [40, 30]}}, {"id": 331, "type": "Text Find and Replace", "pos": [620, 590], "size": {"0": 270.3999938964844, "1": 120}, "flags": {}, "order": 11, "mode": 0, "inputs": [{"name": "text", "type": "STRING", "link": 426, "widget": {"name": "text"}}, {"name": "replace", "type": "STRING", "link": 427, "widget": {"name": "replace"}, "slot_index": 1}], "outputs": [{"name": "result_text", "type": "STRING", "links": [598], "shape": 3, "slot_index": 0}, {"name": "replacement_count_number", "type": "NUMBER", "links": null, "shape": 3}, {"name": "replacement_count_float", "type": "FLOAT", "links": null, "shape": 3}, {"name": "replacement_count_int", "type": "INT", "links": null, "shape": 3}], "properties": {"Node name for S&R": "Text Find and Replace"}, "widgets_values": ["", "\\.\\.", ""], "color": "#232", "bgcolor": "#353"}, {"id": 454, "type": "MathExpression|pysssss", "pos": [1550, 690], "size": {"0": 210, "1": 116.00006103515625}, "flags": {}, "order": 0, "mode": 0, "inputs": [{"name": "a", "type": "INT,FLOAT,IMAGE,LATENT", "link": null}, {"name": "b", "type": "INT,FLOAT,IMAGE,LATENT", "link": null}, {"name": "c", "type": "INT,FLOAT,IMAGE,LATENT", "link": null}], "outputs": [{"name": "INT", "type": "INT", "links": null, "shape": 3, "slot_index": 0}, {"name": "FLOAT", "type": "FLOAT", "links": [633], "shape": 3, "slot_index": 1}], "title": "Fraction base steps", "properties": {"Node name for S&R": "MathExpression|pysssss"}, "widgets_values": ["85/100"]}, {"id": 457, "type": "MathExpression|pysssss", "pos": [1830, 700], "size": {"0": 210, "1": 116.00009155273438}, "flags": {}, "order": 10, "mode": 0, "inputs": [{"name": "a", "type": "INT,FLOAT,IMAGE,LATENT", "link": 636}, {"name": "b", "type": "INT,FLOAT,IMAGE,LATENT", "link": 637}, {"name": "c", "type": "INT,FLOAT,IMAGE,LATENT", "link": null}], "outputs": [{"name": "INT", "type": "INT", "links": [635], "shape": 3, "slot_index": 0}, {"name": "FLOAT", "type": "FLOAT", "links": null, "shape": 3}], "title": "Refiner steps", "properties": {"Node name for S&R": "MathExpression|pysssss"}, "widgets_values": ["a-b"]}, {"id": 456, "type": "MathExpression|pysssss", "pos": [1820, 530], "size": {"0": 210, "1": 116}, "flags": {}, "order": 8, "mode": 0, "inputs": [{"name": "a", "type": "INT,FLOAT,IMAGE,LATENT", "link": 632}, {"name": "b", "type": "INT,FLOAT,IMAGE,LATENT", "link": 633}, {"name": "c", "type": "INT,FLOAT,IMAGE,LATENT", "link": null}], "outputs": [{"name": "INT", "type": "INT", "links": [631, 637], "shape": 3, "slot_index": 0}, {"name": "FLOAT", "type": "FLOAT", "links": null, "shape": 3}], "title": "Base steps", "properties": {"Node name for S&R": "MathExpression|pysssss"}, "widgets_values": ["a*b"]}, {"id": 330, "type": "ShowText|pysssss", "pos": [760, 800], "size": {"0": 250, "1": 330}, "flags": {}, "order": 14, "mode": 0, "inputs": [{"name": "text", "type": "STRING", "link": 489, "widget": {"name": "text"}, "slot_index": 0}], "outputs": [{"name": "STRING", "type": "STRING", "links": [], "shape": 6, "slot_index": 0}], "properties": {"Node name for S&R": "ShowText|pysssss"}, "widgets_values": ["VALUE", "VALUE", "Adrian Ghenie"], "color": "#232", "bgcolor": "#353"}, {"id": 358, "type": "Reroute (rgthree)", "pos": {"0": 930, "1": 1240, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}, "size": [40, 30], "flags": {}, "order": 15, "mode": 0, "inputs": [{"name": "", "type": "*", "link": 668, "dir": 3, "has_old_label": true, "label": " ", "widget": {"name": "value"}}], "outputs": [{"name": "STRING", "type": "STRING", "links": [580], "dir": 4, "has_old_label": true, "label": " ", "slot_index": 0}], "properties": {"resizable": false, "size": [40, 30]}}, {"id": 425, "type": "Reroute (rgthree)", "pos": {"0": 2480, "1": 1260, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}, "size": [40, 30], "flags": {}, "order": 17, "mode": 0, "inputs": [{"name": "", "type": "*", "link": 580, "dir": 3, "has_old_label": true, "label": " ", "widget": {"name": "value"}}], "outputs": [{"name": "STRING", "type": "STRING", "links": [582, 583], "dir": 4, "has_old_label": true, "label": " ", "slot_index": 0}], "properties": {"resizable": false, "size": [40, 30]}}, {"id": 327, "type": "SDXLAspectRatio", "pos": [750, 120], "size": {"0": 210, "1": 130}, "flags": {}, "order": 2, "mode": 0, "outputs": [{"name": "Width", "type": "INT", "links": [599], "shape": 3, "slot_index": 0}, {"name": "Height", "type": "INT", "links": [600], "shape": 3, "slot_index": 1}], "properties": {"Node name for S&R": "SDXLAspectRatio"}, "widgets_values": [64, 64, "1:1  - 1024x1024 square"], "color": "#323", "bgcolor": "#535"}, {"id": 487, "type": "Note", "pos": [50, -80], "size": {"0": 580, "1": 90}, "flags": {}, "order": 3, "mode": 0, "properties": {"text": ""}, "widgets_values": ["Why this workflow:\n1. I have downloaded 4 images from the web for each of 6000+ artists and stitched them into 1024x1024 squares\n\n2. From that HUGE list of artists, create 4 images \"style of XXXX\" and stitch them into a square image, then caption it. \n\n3. Using a python script, take those 1000s of images and stitch them next to their real life counterparts to see if the style is adhered to (manually). "], "color": "#432", "bgcolor": "#653"}, {"id": 332, "type": "Text Multiline", "pos": [310, 540], "size": [210, 76], "flags": {}, "order": 5, "mode": 0, "outputs": [{"name": "STRING", "type": "STRING", "links": [426], "shape": 3, "slot_index": 0}], "properties": {"Node name for S&R": "Text Multiline"}, "widgets_values": ["(style of ..:1.5)"], "color": "#232", "bgcolor": "#353"}, {"id": 491, "type": "DPCombinatorialGenerator", "pos": [70, 750], "size": {"0": 400, "1": 200}, "flags": {}, "order": 7, "mode": 0, "outputs": [{"name": "STRING", "type": "STRING", "links": [678], "shape": 3, "slot_index": 0}], "properties": {"Node name for S&R": "DPCombinatorialGenerator"}, "widgets_values": ["Adrian Ghenie", 0, "fixed", "No"], "color": "#232", "bgcolor": "#353"}, {"id": 483, "type": "Note", "pos": [10, 90], "size": {"0": 450, "1": 300}, "flags": {"collapsed": false}, "order": 6, "mode": 0, "title": "Prompts", "properties": {"text": ""}, "widgets_values": ["{|Adrian Smith|Ai Weiwei|Albert Lynch|Albert Tucker|Albrecht Anker|Aleksander Gierymski|Alex Schomburg|Alexander Fedosav|Alexander Kanoldt|Alexis Gritchenko|Alfred Kelsner|Amir Zand|Anatoly Metlan|Andrei Tarkovsky|Andrew Robinson|Anne Sudworth|Annie Leibovitz|Anto Carte|Anton Fadeev|Antonio Roybal|Aquirax Uno|Arik Brauer|Artur Tarnowski|Banksy|Basil Gogos|Bayard Wu|Beeple|Ben Wooten|Bill Medcalf|Bo Chen|Bob Eggleton}"], "color": "#432", "bgcolor": "#653"}, {"id": 323, "type": "Seed (rgthree)", "pos": [760, 310], "size": {"0": 210, "1": 130}, "flags": {}, "order": 4, "mode": 0, "outputs": [{"name": "SEED", "type": "INT", "links": [601], "shape": 3, "dir": 4, "slot_index": 0}], "properties": {"Node name for S&R": "Seed (rgthree)"}, "widgets_values": [7177135, null, null, null], "color": "#323", "bgcolor": "#535"}, {"id": 433, "type": "ttN pipeKSamplerSDXL", "pos": [2110, 220], "size": {"0": 354.3999938964844, "1": 650}, "flags": {}, "order": 16, "mode": 0, "inputs": [{"name": "sdxl_pipe", "type": "PIPE_LINE_SDXL", "link": 596}, {"name": "optional_model", "type": "MODEL", "link": null}, {"name": "optional_positive", "type": "CONDITIONING", "link": null}, {"name": "optional_negative", "type": "CONDITIONING", "link": null}, {"name": "optional_vae", "type": "VAE", "link": null}, {"name": "optional_refiner_model", "type": "MODEL", "link": null}, {"name": "optional_refiner_positive", "type": "CONDITIONING", "link": null}, {"name": "optional_refiner_negative", "type": "CONDITIONING", "link": null}, {"name": "optional_refiner_vae", "type": "VAE", "link": null}, {"name": "optional_latent", "type": "LATENT", "link": null}, {"name": "optional_clip", "type": "CLIP", "link": null}, {"name": "seed", "type": "INT", "link": 597, "widget": {"name": "seed"}}, {"name": "base_steps", "type": "INT", "link": 631, "widget": {"name": "base_steps"}}, {"name": "refiner_steps", "type": "INT", "link": 635, "widget": {"name": "refiner_steps"}}], "outputs": [{"name": "sdxl_pipe", "type": "PIPE_LINE_SDXL", "links": null, "shape": 3, "slot_index": 0}, {"name": "model", "type": "MODEL", "links": null, "shape": 3}, {"name": "positive", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "negative", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "vae", "type": "VAE", "links": null, "shape": 3}, {"name": "refiner_model", "type": "MODEL", "links": null, "shape": 3}, {"name": "refiner_positive", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "refiner_negative", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "refiner_vae", "type": "VAE", "links": null, "shape": 3}, {"name": "latent", "type": "LATENT", "links": null, "shape": 3}, {"name": "clip", "type": "CLIP", "links": null, "shape": 3}, {"name": "image", "type": "IMAGE", "links": [671], "shape": 3, "slot_index": 11}, {"name": "seed", "type": "INT", "links": null, "shape": 3, "slot_index": 12}], "properties": {"Node name for S&R": "ttN pipeKSamplerSDXL", "ttNnodeVersion": "1.0.2"}, "widgets_values": ["None", 2, "disabled", "Sample", 16, 9, 7.5, "dpmpp_2m", "karras", "Hide", "ComfyUI", 716634014027423, "randomize"]}, {"id": 446, "type": "Constant Number", "pos": [1540, 520], "size": {"0": 210, "1": 122}, "flags": {}, "order": 1, "mode": 0, "inputs": [{"name": "number_as_text", "type": "STRING", "link": null, "widget": {"name": "number_as_text"}}], "outputs": [{"name": "NUMBER", "type": "NUMBER", "links": null, "shape": 3}, {"name": "FLOAT", "type": "FLOAT", "links": null, "shape": 3}, {"name": "INT", "type": "INT", "links": [632, 636], "shape": 3, "slot_index": 2}], "title": "Total steps", "properties": {"Node name for S&R": "Constant Number"}, "widgets_values": ["integer", 25, ""]}, {"id": 432, "type": "ttN pipeLoaderSDXL", "pos": [1110, 200], "size": {"0": 390, "1": 750}, "flags": {}, "order": 13, "mode": 0, "inputs": [{"name": "empty_latent_width", "type": "INT", "link": 599, "widget": {"name": "empty_latent_width"}}, {"name": "empty_latent_height", "type": "INT", "link": 600, "widget": {"name": "empty_latent_height"}}, {"name": "positive", "type": "STRING", "link": 598, "widget": {"name": "positive"}}, {"name": "seed", "type": "INT", "link": 601, "widget": {"name": "seed"}}], "outputs": [{"name": "sdxl_pipe", "type": "PIPE_LINE_SDXL", "links": [596], "shape": 3, "slot_index": 0}, {"name": "model", "type": "MODEL", "links": null, "shape": 3}, {"name": "positive", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "negative", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "vae", "type": "VAE", "links": null, "shape": 3}, {"name": "clip", "type": "CLIP", "links": null, "shape": 3}, {"name": "refiner_model", "type": "MODEL", "links": null, "shape": 3}, {"name": "refiner_positive", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "refiner_negative", "type": "CONDITIONING", "links": null, "shape": 3}, {"name": "refiner_vae", "type": "VAE", "links": null, "shape": 3}, {"name": "refiner_clip", "type": "CLIP", "links": null, "shape": 3}, {"name": "latent", "type": "LATENT", "links": null, "shape": 3}, {"name": "seed", "type": "INT", "links": [597], "shape": 3, "slot_index": 12}], "properties": {"Node name for S&R": "ttN pipeLoaderSDXL", "ttNnodeVersion": "1.1.2"}, "widgets_values": ["SDXL\\sdXL_v10VAEFix.safetensors", "Baked VAE", "None", 1, 1, "None", 1, 1, "SDXL\\sdXL_v10RefinerVAEFix.safetensors", "Baked VAE", "None", 1, 1, "None", 1, 1, -1, "Positive", "none", "comfy", "edges, borders", "none", "comfy", 1024, 1024, 4, 1029071200866146, "randomize"]}, {"id": 480, "type": "ImagesGridByColumns", "pos": [2490, 440], "size": [210, 100], "flags": {}, "order": 18, "mode": 0, "inputs": [{"name": "images", "type": "IMAGE", "link": 671}, {"name": "annotation", "type": "GRID_ANNOTATION", "link": null}], "outputs": [{"name": "IMAGE", "type": "IMAGE", "links": [672], "shape": 3, "slot_index": 0}], "properties": {"Node name for S&R": "ImagesGridByColumns"}, "widgets_values": [5, 2]}, {"id": 481, "type": "ImageResize+", "pos": [2720, 440], "size": [240, 190], "flags": {}, "order": 19, "mode": 0, "inputs": [{"name": "image", "type": "IMAGE", "link": 672}], "outputs": [{"name": "IMAGE", "type": "IMAGE", "links": [673], "shape": 3, "slot_index": 0}, {"name": "width", "type": "INT", "links": null, "shape": 3}, {"name": "height", "type": "INT", "links": null, "shape": 3}], "properties": {"Node name for S&R": "ImageResize+"}, "widgets_values": [1024, 1024, "lanczos", false, "always"]}, {"id": 424, "type": "Image Caption", "pos": [3000, 620], "size": [210, 80], "flags": {}, "order": 20, "mode": 0, "inputs": [{"name": "image", "type": "IMAGE", "link": 673}, {"name": "caption", "type": "STRING", "link": 582, "widget": {"name": "caption"}}], "outputs": [{"name": "image", "type": "IMAGE", "links": [585, 677], "shape": 3, "slot_index": 0}], "properties": {"Node name for S&R": "Image Caption"}, "widgets_values": ["calibri.ttf", "Caption"]}, {"id": 423, "type": "Image Save", "pos": [3380, 670], "size": {"0": 320, "1": 560}, "flags": {}, "order": 21, "mode": 0, "inputs": [{"name": "images", "type": "IMAGE", "link": 585}, {"name": "filename_prefix", "type": "STRING", "link": 583, "widget": {"name": "filename_prefix"}}], "properties": {"Node name for S&R": "Image Save"}, "widgets_values": ["C:\\Users\\MAINUSER\\AI\\artist fidelity\\SDXL results\\", "ComfyUI", "_", 5, "false", "jpg", 100, "false", "false", "false", "false", "true", "false"]}, {"id": 489, "type": "PreviewImage", "pos": [3290, 280], "size": {"0": 210, "1": 250}, "flags": {}, "order": 22, "mode": 0, "inputs": [{"name": "images", "type": "IMAGE", "link": 677}], "properties": {"Node name for S&R": "PreviewImage"}}], "links": [[426, 332, 0, 331, 0, "STRING"], [427, 357, 0, 331, 1, "STRING"], [472, 357, 0, 351, 0, "*"], [489, 351, 0, 330, 0, "STRING"], [580, 358, 0, 425, 0, "*"], [582, 425, 0, 424, 1, "STRING"], [583, 425, 0, 423, 1, "STRING"], [585, 424, 0, 423, 0, "IMAGE"], [596, 432, 0, 433, 0, "PIPE_LINE_SDXL"], [597, 432, 12, 433, 11, "INT"], [598, 331, 0, 432, 2, "STRING"], [599, 327, 0, 432, 0, "INT"], [600, 327, 1, 432, 1, "INT"], [601, 323, 0, 432, 3, "INT"], [631, 456, 0, 433, 12, "INT"], [632, 446, 2, 456, 0, "INT,FLOAT,IMAGE,LATENT"], [633, 454, 1, 456, 1, "INT,FLOAT,IMAGE,LATENT"], [635, 457, 0, 433, 13, "INT"], [636, 446, 2, 457, 0, "INT,FLOAT,IMAGE,LATENT"], [637, 456, 0, 457, 1, "INT,FLOAT,IMAGE,LATENT"], [668, 351, 0, 358, 0, "*"], [671, 433, 11, 480, 0, "IMAGE"], [672, 480, 0, 481, 0, "IMAGE"], [673, 481, 0, 424, 0, "IMAGE"], [677, 424, 0, 489, 0, "IMAGE"], [678, 491, 0, 357, 0, "*"]], "groups": [], "config": {}, "extra": {}, "version": 0.4}

DDG scrape script works great. I'm debating doing this scrape to include thumbnails of the actual artist's work in this app as a point of comparison. Seems like fair use to me, but I'm not qualified to give legal advice.

I recommend saving as webp with quality 85. The files will be equal or slightly larger than jpg, but much higher quality, e.g.
final_image.save(file_name,"webp",optimize=True,quality=85)

Good to know. I am back on the clustering portion, but I decided to stop thinking (for this portion at least) about how accurate the style is to the original artist, at least until SD3.

When you think about why anyone will put in an artist style, it's because they're looking for an aesthetic. If you know the artist, you can test that artist yourself and see if it fits. But if you want to start with the aesthetic, it doesn't really matter how you get there.

So I decided to blindly test artist names in the clustering algorithm, which I have been working on improving. The "unknown" names will all cluster together and be obvious. The contaminated ones (e.g. "Asian art") will still provide an aesthetic, which the user might be interested in anyway.

More importantly, manually reviewing artist fidelity to source material was a) super time consuming and b) super difficult to do reproducibly. So I went for the lower hanging fruit. On the plus side, my new process rally is finding good clusters of artists which can be reviewed and used in an intelligible way.

All of this of course it mostly to perfect the methods before SD3 comes out. I really hope it has more style flexibility of 1.5 combined with the cohesiveness of SDXL. My experiments with cascade were promising in that the weak artists from SDXL seemed to be a bit more true and differentiable, but we will see how SD3 continues that trend.

I added images found on DDG to the app. It's fun to compare the ground truth to what SD outputs. You can grab all the images I've already pulled, but if you are pulling images on your own, I found it helpful to use "requests.get(url, timeout=4)". Some images will hang the script forever without the timeout.

If you don't mind, I'll gladly add a credit in the about section. As "Reggie Montoya"?

Sorry, was away visiting family.

Reggie Montoya credit is fine, thanks!

Added!

You two have basically done a lot of what I ended up independently arriving at in a quest to figure out what Flux knows (or doesn't, and in some cases, the 'wrong' info is actually quite nice and useful, just not at all what the artist looks like. That makes this far more useful than simply knowing if it matches the 'ground truth'

The way SD, in general, works: all of this information is basically 'marked' on a massively dimensional grid. It learns that all of 'this' area over here matches Dog, and this area over here match 'Cat'... and the direction+distance between those areas can find interesting 'curiosities' too. So when a model (SDXL, Flux, etc) 'learns' to identify a spot in this map with a name, EVEN if that name was never trained, or insufficiently trained, the 'sign post' can be interesting and useful, even if it's absolutely NOT the artist the name belongs to.

Example:
https://terrariyum-sdxl-artists-browser.static.hf.space/images/Ground_Truth/Amedeo_Modigliani-artwork.webp
The SDXL versions in the artist browser are interesting, but nothing special.
Here's Flux Schnell's 'take on it' (which is the smallest fastest version):
prompt for all 3: "in the style of Amedeo Modigliani"
_in the style of Amedeo Modigliani _7e8b3446-b823-48f5-81d2-41421035cbad.webp
_in the style of Amedeo Modigliani _89dada4c-91ec-47fe-8d27-105e967331ec.webp
_in the style of Amedeo Modigliani _c93e60a8-c882-4134-8940-c1fba06e3d93.webp

So far, as I can tell, this isn't some other artist's work, it's a style of it's own, that even if it's not really "Amedeo Modigliani", so far as Flux is concerned, that's the spot the sign post leads.

So for automating, here's the problem: "Amedeo Modigliani" in SDXL stock is pretty close to the original, so a set of tags matching the original would make sense. But the other SDXL don't match much or at all. And Flux clearly has a different 'picture' which deserves a set of tags to help people find/use it.

I'd 'propose' the following (based on some prelim work I did):

  1. Ground truth tags - Actual Art/Work from Person.
  2. Base Model Tags (so SDXL, or Flux, etc)
  3. non-Base model Tags, as someone spends time on. With Loras and/or IPAdapter usage, adding something to a model lacking is getting easier and easier. But finding the sweet nooks and crannies of the base model is often really rewarding.

Using ComfyUI I did a basic test of using some of the LLM models (like LLAVA) tagging the image created. I need to stack/improve this to make it more useful.
Ideally: Add list of Names to file (or paste into box), run workflow(s), it churns thru each name, makes a pile of images, and then uses LLMs to 'view and tag' those images which gives us a base set of tags (example: if in 9 out of 10 images, the tag "glamour" comes up, good tag... if 1 out of 10, the tag "Bird" is likely a stray and not worth keeping.

We can even run Ground Truth Images thru this 'tagger' and see what turns up (and compare to the existing tags now)

https://lib.kalos.art/topic has a similar (but not as useful by far) viewer that does SD and also MidJourney versions, and they list 14,000+ names for MJ 5.2

If we have a set of comfyUI workflows that take names and give back image grids + tags for each, potentially indexing not merely 14k but Any Name (meaning any signpost in the Latent Map) can be documented, and each model is essentially it's own unique 'landscape' to explore.

The problem with Searching is I'm finding Bing and Google are turning up SD generated images when you search for the name. In other words: it's no longer possible to know if the images you are getting as 'ground truth' (unless verified) are not themselves products of the engine(s).

So there is a value both in a Ground Truth collection, and in documenting these models. And done right, this work could be done by many, running a set of workflows, and crowdsourcing these 'maps'

@terrariyum awesome project, and when I went to see what code existed for making my doing this with Flux easier, I found this. Kudos.

ultra simple workflow to just analyze a single provided image (workflow embedded too):

workflow.png

another example to keep in mind: even small addtional prompt words or methods can trigger big changes.
I was using "3 image panels, each in the style of Name" which gives me really good results, and lets me see a lot about the model's expression quickly.
EXCEPT it also implies comics, and making 3 1/3 images can be slightly different than 1 image.

And that alone is enough to 'skew the results' in good or bad:

Ground Truth of Amy Sillman: https://en.wikipedia.org/wiki/Amy_Sillman
https://www.amysillman.com/News/

image.png
image.png
image.png

Flux "In the style of..."
_in the style of Amy Sillman _9fdba9d0-f19f-4f49-bfc1-93c9ced32bd9.webp
_in the style of Amy Sillman _cb9a1455-31dc-4037-be0b-314dbb23603b.webp
_in the style of Amy Sillman _d5649749-294b-4afc-b601-0e8f27488bc4.webp

Flux "3 image panels, each in the style of..."
3 image panels, each in the style of Amy Sillman 484427fb-7be5-4f60-8b6f-b19d6faec976.webp
3 image panels, each in the style of Amy Sillman e966dc09-2796-4964-b260-4b34c9363473.webp
3 image panels, each in the style of Amy Sillman e10310cd-6c20-47a1-a686-8d1254369f68.webp

^ now someone could say "Oh, that's Such and Such's artwork style." If you see something you recognize like that, do speak up... Flux does this for "Amy Sillman" even if none of her was trained into it. This is where 'image similarity' could be useful.

Can't wait to see what this leads to! It's clear that vanilla Flux can make many interesting non-photo styles. Even for SDXL, in most cases the "by so-n-so" output is only reminiscent of the artist's actual style. If adding a name to a Flux prompt generates a predictable style, that's useful even if it looks nothing like the artist. I also think we'll see models with all new architectures released soon that might be better at non-photo styles, maybe AuroFlow.

But we might not get another base model that is actually trained on named artists until the lawsuits against SAI and Midjourney are resolved. It looks that way since Flux and AuroFlow have quietly scrubbed all artists names, and AstraliteHeart has stated that he'll scrub artist names from the next version of Pony.

As far as this project goes, I'm in wait and see mode.

agreed, and FYI "non-photo styles" can be used as photographs too:

"photo in the style of" or "photo taken in the style of" Amedeo Modigliani

photo_taken_in_the_style_of_amedeo_modigliani-2c53fa74-4de2-4687-8ba2-0b1280eda67b.webp
photo_in_the_style_of_amedeo_modigliani-b1c1b4af-02be-4d00-82cf-ce19740eebf7.webp
photo_in_the_style_of_amedeo_modigliani-592b7c93-386c-4f50-a9b2-58900fedfd63.webp

The consistency is quite nice. I think 'landscape' 'photo', 'portrait', and a few other 'types' of images are probably all good things for 'mapping out' what a particular 'artist name' does.

It's interesting because the lines are blurring. You can ask for names like Authors (HP Lovecraft, Andre Norton, Anne McCaffrey are good examples) which generate artwork, but aren't artists themselves, but have enough data in the model based on book covers/illustrations to have a 'flavor'. Versa, "in the style of David Bowie" will generate images OF Bowie, despite his doing art, and many other non-artists (Jeff Goldblum is one I've seen on lists) really just generate based on various images of them by many people.
I've also found Cartoonist/ComicArtists will generate self-images with 'style of' at times, even if they aren't well known 'images'. Alan Moore's Ground Truth has art not by him, as an example, but of him, for example.

Those are nice looking examples! That behavior is similar to SDXL. "Photo by {non-photograph}" often has a nice effect like this. For example, in SDXL using "Harumi Hironaka" makes women with a specific style of makeup, whether as an illustration or photo. The "self-portrait" of the artist has always been an issue in SDXL too.

It's often surprising what details of an artist/director/author/character models fixate on. In SDXL, "Tim Burton" usually gives characters giant bug-eyes, which only matches the art from a small part of his career. Some well known movie titles will bring in elements or styles from that movie, but unfortunately they disappear unless the prompt is very short.

Can't wait to see what this leads to! It's clear that vanilla Flux can make many interesting non-photo styles. Even for SDXL, in most cases the "by so-n-so" output is only reminiscent of the artist's actual style. If adding a name to a Flux prompt generates a predictable style, that's useful even if it looks nothing like the artist. I also think we'll see models with all new architectures released soon that might be better at non-photo styles, maybe AuroFlow.

Yes, I agree with that statement. That's where the clustering will come in handy, I think.

If we find that many artist names simply orbit around a central style, then you can choose the artist name that is strongest in the cluster and run from there.

What I would love to see is a community driven, walled off (for defensive purposes) initiative to build a huge database of artists styles, emphasizing newer and more varied artists [we have enough classical oil painters...] and making a super lora (or a full finetune) that has them all built in.

Make a checkpoint in flux that is for artstyles what Pony was for... smut.

The good news is that the cost to train models per parameter keeps getting cheaper. Pony's creator has announced that the Auraflow version will use some method of clustering artists, then train with that cluster name instead of the artist names. It'll be interesting to see if it uses giant crude buckets like "90s anime" or if it uses many buckets that are based on visual style clustering.

Sign up or log in to comment