Suggested Params for generation?

by IreGaddr - opened Apr 14

Apr 14

•

Okay you guys the base default settings in Stable Swarm UI and using the new CosXL initially gives um...well...yeah prompt: beautiful ginger man, HDR, 8k, ISO100, F1/16, Shutter Speed 1/1000,

The model card is very sparse about which samplers, CFG, number of steps to use for generation. After some testing I can say that selecting a 9x16 image in StableSwarmUI, CFG 1.5, 50 steps, DPM2 A, produces much much better results. same prompt as before.

So it can do photorealism despite what other threads show.

Vargol

Apr 14

•

edited Apr 14

I don't know about Comfy, in Diffusers I'm using

pipe.scheduler = EDMEulerScheduler(sigma_min=0.002, sigma_max=120.0, sigma_data=1.0, prediction_type="v_prediction")

CFG = 8 , steps 30 to 50 but I've not tried tuning the number of steps.

Oh, thats interesting, that gets garbage results with your prompt, but the other ones I've run have been fine.

TwoPerCent

Stability AI org Apr 14

Using a Karras scheduler with max_sigma ~120 is also a good idea. This model can go up to max_sigma 999 comfortably, feel free to experiment.

Most DPMPP samplers should work well iirc.

Please note this model was not tuned for aesthetics at all. The model was simply an experiment we decided to release for interested researchers.

IreGaddr

Apr 14

Using a Karras scheduler with max_sigma ~120 is also a good idea. This model can go up to max_sigma 999 comfortably, feel free to experiment.

Most DPMPP samplers should work well iirc.

Please note this model was not tuned for aesthetics at all. The model was simply an experiment we decided to release for interested researchers.

See that's the kind of info you should put on the model card on the hugging face website and github repos. How am I supposed to research if you don't tell us what's new in the release and what to settings to play around with? Where was the paper behind the technique attached to this models release? No link in the model card again find out. I'm not trying to be confrontational; I'm trying to be helpful in ensuring that others don't run into the same confusion I did. This was one of the shorter model cards y'all have have ever released in comparison to others. A bit more info in some prominent places would let researchers/users know what's what.

ptx0

Apr 15

its not that the model isnt aesthetic. it isnt even functional like a typical v-prediction model. did you train it using max grad norm set to 0.3? seriously how does this even happen?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment