Agreene5's picture
Update README.md
061e88f verified

Rhythm Heaven Style LoRA for Stable Diffusion 1.5

Model is also on CivitAI: https://civitai.com/models/87254?modelVersionId=258514

Model Details

Version 1 parameters:

steps_per_image: 50
total_images: 49
total_steps: ~2400
training_model: Anything_V3
network_dim: 128
network_alpha: 128
network_train_on: both
learning_rate: 1e-4
unet_lr: 0
text_encoder _lr: 5e-5
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 6
num_epochs: 6
mixed_precision: fp16
save_precision fp16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 512
max_token_length: 225
clip_skip: 2
additional_argument: --shuffle_caption --xformers
training_hardware: Google Colab Free Tier: Nvidia Tesla T4 GPU
training_time: ~45 minutes

Version 1.1 parameters:

steps_per_image: 20
total_images: 122 (61 unique images, doubled amount by mirroring them)
total_steps: 2440
training_model: Any_LoRA
optimizer: AdamW
network_dim: 128
network_alpha: 128
network_train_on: both
learning_rate: 1e-4
unet_lr: 1e-4
text_encoder _lr: 5e-5
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 8
num_epochs: 6
mixed_precision: bf16
save_precision bf16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 768
max_token_length: 225
clip_skip: 2
additional_argument: --xformers
training_hardware: RTX 3090
training_time: ~1.5 hours (I don't remember exactly)

Version 1.1 Improvements:

-Better style consistency: The model generates in a style closer to the Rhythm Heaven series much more consistently. 1.0 generated a bit more of a detailed style though so if that's what you want you should use that one. -Removed "rhythm_heaven" trigger: Seems like a style trigger isn't really necessary, removing it just saves a bit of token length. -Less unprompted black and white generations: This one isn't as big but I manually added color to some of the training images to get more variety which consequently means you'll get less black and white generations.

Model Description

Trained on humanoid characters from the Rhythm Heaven series (and some from Wario Ware) using AnyLoRA. Captions were done manually using booru tags.

  • Model type: Standard LoRA
  • Finetuned from model: Stable Diffusion 1.5 based models

Model Sources

Uses

Used in conjunction with a booru based Stable Diffusion 1.5 model (ex. Any_LoRA) to emulate the style of the Rhythm_Heaven series. I recommend using it with a weight around 0.7 when prompting. Also, another reminder, this model was trained exclusively with booru tags so I'm not sure how well it'll work using blip captions.