Mind sharing the parameters you used during training?

by zbulrush - opened Jul 29, 2023

Jul 29, 2023

Hey there! So, I was trying to train ControlNet using the Diffusers library, but I ran into a few hiccups. Mind sharing the parameters you used during training? It could really help me out! Thanks!

SargeZT

Owner Jul 30, 2023

•

edited Jul 30, 2023

Sure! I've made a few modifications to the training script, including using a non-standard optimizer. Here's my version of the script and what I used to launch it was accelerate launch /workspace/train_control.py --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 --output_dir=/workspace/out --dataset_name=/workspace/zoe_stuff --conditioning_image_column=depth --validation_prompt "nightmare construction worker, unsettling" "android warrior, unsettling" --validation_image /workspace/depth2.png /workspace/depth4.png --validation_steps 100 --tracker_project_name sd_xl_train_controlnet --mixed_precision fp16 --report_to wandb --push_to_hub --hub_model_id controlnet-sd-xl-1.0-depth-magma --max_grad_norm 1.0 --checkpointing_steps 10000 --num_train_epochs 5 --resolution 512 --seed 1 --gradient_accumulation_steps 1 --train_batch_size 8 --enable_xformers_memory_efficient_attention --caption_column prompt --gradient_checkpointing --pretrained_vae_model_name_or_path madebyollin/sdxl-vae-fp16-fix --use_prodigy --set_grads_to_none --controlnet_model_name_or_path /workspace/out. If you were running into black preview images, chances are your problem was the VAE; the version included by default is unstable at FP16. Finally, you can few my wandb run for it here

Hope that helps!

zbulrush

Jul 30, 2023

Wow, thank you so much! You've been a great help to me.

SargeZT

Owner Jul 30, 2023

No worries! I hope everything goes well, feel free to hit me up here or on discord @sargezt if you need anything else on it. I'm working on a more general purpose training base, a base controlnet if you will, that is currently training. Right now a lot of the time in training is simply doing the first adaptation of the network, so I'm doing augmentations on the input data and mixing depth, canny, and seg during training to decondition the network from normal image generation. If you'd like, I can certainly ping you when it's done so you can resume from that. Happy training!

zbulrush

Jul 30, 2023

I'm really puzzled, the validate_image generated every 100 steps is always the same, you can find it in my wandb:
https://wandb.ai/zbulrush/sd_xl_train_controlnet/runs/3n9w86jj/workspace?workspace=user-zbulrush

SargeZT

Owner Jul 31, 2023

That's very strange. I'll try to replicate that since the dataset is public and report back!

SargeZT

Owner Jul 31, 2023

Oh jeez, I just noticed I left the augments on in the version I sent you. I was using that for some experiments. I've updated the gist to remove them. I'm about to start a run to make sure that that was the problem with that verison.

SargeZT

Owner Jul 31, 2023

OK yeah, that appears to have been the problem. I can't speak to how it's fitting yet, but my samples are most certainly changing. I have relocated the trivial transform to only act on the target image, which is how it's supposed to be used, not in the way I was using it on both the control and training images. Here is the new version, one note though is that I've ripped out the manual prompts and just use the test split of the dataset to generate images. If you want to revert to the old behavior, just rip out the log_validation function from your current version.

TrivialAugmentWide will take a bit longer to learn, but it will be much less prone to overfitting. You can remove it from the preprocess transforms if you so wish though!

zbulrush

Jul 31, 2023

OK yeah, that appears to have been the problem. I can't speak to how it's fitting yet, but my samples are most certainly changing. I have relocated the trivial transform to only act on the target image, which is how it's supposed to be used, not in the way I was using it on both the control and training images. Here is the new version, one note though is that I've ripped out the manual prompts and just use the test split of the dataset to generate images. If you want to revert to the old behavior, just rip out the log_validation function from your current version.

TrivialAugmentWide will take a bit longer to learn, but it will be much less prone to overfitting. You can remove it from the preprocess transforms if you so wish though!

thanks again, I'll hurry up and give it a try!

mattdl

Aug 10, 2023

Thanks a lot for sharing the training details! I'm trying to train a controlnet for SDXL (full precision) myself but quickly get to noisy validation images (already after 200 steps).
Now I'm trying your hyperparameters, but I'm seeing that the learning rate is set to '1' with 'cosine' schedule.
Do you know how your model learning anything with such a high learning rate? Thanks!

SargeZT

Owner Aug 13, 2023

I use the prodigy optimizer which automatically adapts the LR but requires starting out at 1 and going down, preferably with a cosine schedule.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment