YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Rhythm Heaven Style LoRA for Stable Diffusion 1.5 + SDXL

Model is also on CivitAI: https://civitai.com/models/87254?modelVersionId=258514

Model Details

Version 1 parameters:

steps_per_image: 50
total_images: 49
total_steps: ~2400
training_model: Anything_V3
network_dim: 128
network_alpha: 128
network_train_on: both
learning_rate: 1e-4
unet_lr: 0
text_encoder _lr: 5e-5
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 6
num_epochs: 6
mixed_precision: fp16
save_precision fp16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 512
max_token_length: 225
clip_skip: 2
additional_argument: --shuffle_caption --xformers
training_hardware: Google Colab Free Tier: Nvidia Tesla T4 GPU
training_time: ~45 minutes

Version 1.1 parameters:

steps_per_image: 20
total_images: 122 (61 unique images, doubled amount by mirroring them)
total_steps: 2440
training_model: Any_LoRA
optimizer: AdamW
network_dim: 128
network_alpha: 128
network_train_on: both
learning_rate: 1e-4
unet_lr: 1e-4
text_encoder _lr: 5e-5
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 8
num_epochs: 6
mixed_precision: bf16
save_precision bf16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 768
max_token_length: 225
clip_skip: 2
additional_argument: --xformers
training_hardware: RTX 3090
training_time: ~1.5 hours (I don't remember exactly)

Version 1.1 Improvements:

Better style consistency: The model generates in a style closer to the Rhythm Heaven series much more consistently. 1.0 generated a bit more of a detailed style though so if that's what you want you should use that one. Removed "rhythm_heaven" trigger: Seems like a style trigger isn't really necessary, removing it just saves a bit of token length. Less unprompted black and white generations: This one isn't as big but I manually added color to some of the training images to get more variety which consequently means you'll get less black and white generations.

Version 1 (SDXL) parameters:

steps_per_image: 20
total_images: 122 (61 unique images, doubled amount by mirroring them)
total_steps: 7320
training_model: anima_pencil-XL
optimizer: Adafactor
network_dim: 128
network_alpha: 1
network_train_on: both
learning_rate: 1.2e-3
unet_lr: 1.2e-3
text_encoder _lr: 1.2e-3
lr_scheduler: constant
lr_scheduler_num_cycles: 1
lr_scheduler_power: 1
train_batch_size: 5
num_epochs: 15
mixed_precision: bf16
save_precision bf16
save_n_epochs_type: save_every_n_epochs
save_n_epochs_type_value: 1
resolution: 1024
max_token_length: 75
clip_skip: 2
additional_argument: --xformers
training_hardware: RTX 3090
training_time: ~6 hours

Version 1 (SDXL) Improvements:

Cleaner looking images: All of the images used to train this model were upscaled 2x so outputs are less grainy. Better prompt understanding: SDXL has a better understanding of prompts so training a LoRA using it as a base makes the LoRA get a better understanding too.

Model Description

Trained on humanoid characters from the Rhythm Heaven series (and some from Wario Ware) using AnyLoRA. Captions were done manually using booru tags.

  • Model type: Standard LoRA
  • Finetuned from model: Stable Diffusion 1.5 based models

Uses

Used in conjunction with a booru based Stable Diffusion 1.5 model (ex. Any_LoRA) to emulate the style of the Rhythm_Heaven series. I recommend using it with a weight around 0.7 when prompting. Also, another reminder, this model was trained exclusively with booru tags so I'm not sure how well it'll work using blip captions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.