Suggested Params for generation?

#4
by IreGaddr - opened

Okay you guys the base default settings in Stable Swarm UI and using the new CosXL initially gives um...well...yeah prompt: beautiful ginger man, HDR, 8k, ISO100, F1/16, Shutter Speed 1/1000,
0303-Professional portrait of a beautiful gin-OfficialStableDiffusioncosxlsafetensor-1218311866.jpg
The model card is very sparse about which samplers, CFG, number of steps to use for generation. After some testing I can say that selecting a 9x16 image in StableSwarmUI, CFG 1.5, 50 steps, DPM2 A, produces much much better results. same prompt as before.
0323-Professional photo portrait of a beautif-OfficialStableDiffusioncosxlsafetensor-1202563252.jpg
So it can do photorealism despite what other threads show.

I don't know about Comfy, in Diffusers I'm using

pipe.scheduler = EDMEulerScheduler(sigma_min=0.002, sigma_max=120.0, sigma_data=1.0, prediction_type="v_prediction")

CFG = 8 , steps 30 to 50 but I've not tried tuning the number of steps.

Oh, thats interesting, that gets garbage results with your prompt, but the other ones I've run have been fine.

Stability AI org

Using a Karras scheduler with max_sigma ~120 is also a good idea. This model can go up to max_sigma 999 comfortably, feel free to experiment.

Most DPMPP samplers should work well iirc.

Please note this model was not tuned for aesthetics at all. The model was simply an experiment we decided to release for interested researchers.

Using a Karras scheduler with max_sigma ~120 is also a good idea. This model can go up to max_sigma 999 comfortably, feel free to experiment.

Most DPMPP samplers should work well iirc.

Please note this model was not tuned for aesthetics at all. The model was simply an experiment we decided to release for interested researchers.

See that's the kind of info you should put on the model card on the hugging face website and github repos. How am I supposed to research if you don't tell us what's new in the release and what to settings to play around with? Where was the paper behind the technique attached to this models release? No link in the model card again find out. I'm not trying to be confrontational; I'm trying to be helpful in ensuring that others don't run into the same confusion I did. This was one of the shorter model cards y'all have have ever released in comparison to others. A bit more info in some prominent places would let researchers/users know what's what.

its not that the model isnt aesthetic. it isnt even functional like a typical v-prediction model. did you train it using max grad norm set to 0.3? seriously how does this even happen?

Sign up or log in to comment