Trying to recreate styles in local stable diffusion xl instance.

#113
by uniemperor - opened

So, I've been trying to extract the prompt for my local stable diffusion instance, cause I really like some of these styles.
I've reverse engineered them based on looking through the source code of both this space and VideoChain API. From what I gather it's supposed to be done with a base sdxl model.

For example, here's the modern american comic style:
Positive: beautiful, intricate details, modern american comic about {prompt}, digital color comicbook style, award winning, high resolution
Negative: watermark, copyright, blurry, low quality, ugly

Yet, when I try it on stable diffusion XL, my results are markedly worse, less in line with the style. The first image shows my local generation made with ClipDrop, the second is one I've made with Comic Factory.
mecha.jpeg
mecha2.png

Other styles are similarly worse.

Am I missing something? Is this using a custom model somehow?

Hello,

well they both looks pretty nice to be honest!

Now, regarding the differences there could be multiple causes:

  • my implementation crops the prompt (see my post about how the prompt is constructed), but still it is possible that sometimes some words that are after the user prompt are cropped, leading to different results

  • my SDXL code always add the following keywords:
    positive = "beautiful", "intricate details" + prompt + "award winning", "high resolution"
    negative = "watermark", "copyright", "blurry", "low quality", "ugly"
    it's something I did a long time ago back in July, and I forgot to remove it
    (ideally I would prefer to put those keywords in the client/frontend app - I've added a note to remind myself of refactoring that)

  • for the minor differences around edges and lines, maybe it is cuased by different settings in the SDXL parameters
    Here's what I use:

  const rawResponse = (await api.predict("/run", [		
    positive, // string  in 'Prompt' Textbox component		
    negative, // string  in 'Negative prompt' Textbox component		
    positive, // string  in 'Prompt 2' Textbox component		
    negative, // string  in 'Negative prompt 2' Textbox component		
    true, // boolean  in 'Use negative prompt' Checkbox component		
    false, // boolean  in 'Use prompt 2' Checkbox component		
    false, // boolean  in 'Use negative prompt 2' Checkbox component		
    seed, // number (numeric value between 0 and 2147483647) in 'Seed' Slider component		
    width, // number (numeric value between 256 and 1024) in 'Width' Slider component		
    height, // number (numeric value between 256 and 1024) in 'Height' Slider component		
    8, // number (numeric value between 1 and 20) in 'Guidance scale for base' Slider component		
    8, // number (numeric value between 1 and 20) in 'Guidance scale for refiner' Slider component		
    nbSteps, // number (numeric value between 10 and 100) in 'Number of inference steps for base' Slider component		
    nbSteps, // number (numeric value between 10 and 100) in 'Number of inference steps for refiner' Slider component		
    true, // boolean  in 'Apply refiner' Checkbox component,
    secretToken
  ])) as any

For reference the SDXL server I use is: https://huggingface.co/spaces/hysts/SD-XL
Capture d’écran 2023-09-08 à 10.41.52.png

I still can't recreate the exact styles, but I guess it's close enough. High res upscale seems to give a much cleaner image, I also recommend increasing the step count significantly. Though I still wonder what leads to the palpable difference.

Sign up or log in to comment