SVS Dreambooth Checkpoint

This is a dreambooth checkpoint for the SVS project. I used ReV Animated v1.2.2 as my base model.

Sample pictures of this concept (along with prompts) are present in SAMPLES.md

Note: I haven't yet upload the diffuers compatible weights. The provided safetensors weights are directly compatible with the AUTOMATIC1111/stable-diffusion-webui

Model Overview:

Trigger: <svs-person>
Base Model: ReV Animated v1.2.2
VAE: stabilityai/sd-vae-ft-mse
This model likes: ((best quality)), ((masterpiece)), (detailed) in beginning of prompt if you want anime-2.5D type
Model works best on these resolutions:
- 512x512
- 768x512
- 768x512

Training

For training, I used a modified version of the hugginface diffuers dreambooth training script. Training script is present in 'train_dreambooth.py'. For launching the training process I used the following command:

export INSTANCE_DIR="svs_20_no_full_body/"
export CLASS_DIR="regularization/REGULARIZATION-IMAGES-revAnimated/person/"
export OUTPUT_DIR="models/$(date +%Y%m%d-%H%M%S)"

accelerate launch train_dreambooth.py \
    --pretrained_model_name_or_path="models/hub/revAnimated_v122" \
    --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
    --instance_data_dir=$INSTANCE_DIR \
    --class_data_dir=$CLASS_DIR \
    --output_dir=$OUTPUT_DIR \
    --with_prior_preservation --prior_loss_weight=1.0 \
    --seed=145391 \
    --instance_prompt="a photo of <svs-person> person" \
    --class_prompt="a photo of a person" \
    --resolution=512 \
    --train_batch_size=1 \
    --train_text_encoder \
    --mixed_precision="fp16" \
    --use_8bit_adam \
    --enable_xformers_memory_efficient_attention \
    --gradient_accumulation_steps=1 \
    --learning_rate=1e-6 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --num_class_images=300 \
    --max_train_steps=300 \
    --checkpoints_total_limit=1 \
    --checkpointing_steps=100000 \
    --validation_steps 25 \
    --report_to wandb \
    --validation_prompt "((best quality)), ((masterpiece)), (detailed), <svs-person> person with red hair, doing a bicep curl in the gym"

Note about Dataset

I starting by training on the full dataset and the default parameters of the original dreambooth training script but I found that the models overfitted very easily to the same pose fully body images. So I decided to train on a 20 image subset of the original dataset. These dataset mostly contained the close up face images and the upper body images of svs. I did not use any of the whole body images, the models tend to overfit on those very easily. I tried to include closeups and half body shots of svs. I did not use full body shots as they loose too much resolution and SD can actually make pretty good full body poses without them.

Training logs

Some of the training runs were logged into wandb and can be found here: logs