Rhythm Heaven Style LoRA for Stable Diffusion 1.5 + SDXL

Model is also on CivitAI: https://civitai.com/models/87254?modelVersionId=258514

Model Details

Version 1 parameters:

steps_per_image: 50
total_images: 49
total_steps: ~2400
training_model: Anything_V3
network_dim: 128
network_alpha: 128
network_train_on: both
learning_rate: 1e-4
unet_lr: 0
text_encoder _lr: 5e-5
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 6
num_epochs: 6
mixed_precision: fp16
save_precision fp16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 512
max_token_length: 225
clip_skip: 2
additional_argument: --shuffle_caption --xformers
training_hardware: Google Colab Free Tier: Nvidia Tesla T4 GPU
training_time: ~45 minutes

Version 1.1 parameters:

steps_per_image: 20
total_images: 122 (61 unique images, doubled amount by mirroring them)
total_steps: 2440
training_model: Any_LoRA
optimizer: AdamW
network_dim: 128
network_alpha: 128
network_train_on: both
learning_rate: 1e-4
unet_lr: 1e-4
text_encoder _lr: 5e-5
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 8
num_epochs: 6
mixed_precision: bf16
save_precision bf16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 768
max_token_length: 225
clip_skip: 2
additional_argument: --xformers
training_hardware: RTX 3090
training_time: ~1.5 hours (I don't remember exactly)

Version 1.1 Improvements:

Better style consistency: The model generates in a style closer to the Rhythm Heaven series much more consistently. 1.0 generated a bit more of a detailed style though so if that's what you want you should use that one. Removed "rhythm_heaven" trigger: Seems like a style trigger isn't really necessary, removing it just saves a bit of token length. Less unprompted black and white generations: This one isn't as big but I manually added color to some of the training images to get more variety which consequently means you'll get less black and white generations.

Version 1 (SDXL) parameters:

steps_per_image: 20
total_images: 122 (61 unique images, doubled amount by mirroring them)
total_steps: 7320
training_model: anima_pencil-XL
optimizer: Adafactor
network_dim: 128
network_alpha: 1
network_train_on: both
learning_rate: 1.2e-3
unet_lr: 1.2e-3
text_encoder _lr: 1.2e-3
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 5
num_epochs: 15
mixed_precision: bf16
save_precision bf16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 1024
max_token_length: 75
clip_skip: 2
additional_argument: --xformers
training_hardware: RTX 3090
training_time: ~6 hours

Version 1 (SDXL) Improvements:

Cleaner looking images: All of the images used to train this model were upscaled 2x so outputs are less grainy. Better prompt understanding: SDXL has a better understanding of prompts so training a LoRA using it as a base makes the LoRA get a better understanding too.

Model Description

Trained on humanoid characters from the Rhythm Heaven series (and some from Wario Ware) using AnyLoRA. Captions were done manually using booru tags.

Model type: Standard LoRA
Finetuned from model: Stable Diffusion 1.5 based models

Uses

Used in conjunction with a booru based Stable Diffusion 1.5 model (ex. Any_LoRA) to emulate the style of the Rhythm_Heaven series. I recommend using it with a weight around 0.7 when prompting. Also, another reminder, this model was trained exclusively with booru tags so I'm not sure how well it'll work using blip captions.