![](https://huggingface.co/Agreene5/Rhythm_Heaven_Style_LoRA/resolve/main/CivitAIExamples2/formodelcard.png "3 example images") # Rhythm Heaven Style LoRA for Stable Diffusion 1.5 + SDXL Model is also on CivitAI: https://civitai.com/models/87254?modelVersionId=258514 ## Model Details ### Version 1 parameters: steps_per_image: 50 total_images: 49 total_steps: ~2400 training_model: Anything_V3 network_dim: 128 network_alpha: 128 network_train_on: both learning_rate: 1e-4 unet_lr: 0 text_encoder _lr: 5e-5 lr_scheduler: constant lr_scheduler_num_cycles: 1 lr_scheduler_power: 1 train_batch_size: 6 num_epochs: 6 mixed_precision: fp16 save_precision fp16 save_n_epochs_type: save_every_n_epochs save_n_epochs_type_value: 1 resolution: 512 max_token_length: 225 clip_skip: 2 additional_argument: --shuffle_caption --xformers training_hardware: Google Colab Free Tier: Nvidia Tesla T4 GPU training_time: ~45 minutes ### Version 1.1 parameters: steps_per_image: 20 total_images: 122 (61 unique images, doubled amount by mirroring them) total_steps: 2440 training_model: Any_LoRA optimizer: AdamW network_dim: 128 network_alpha: 128 network_train_on: both learning_rate: 1e-4 unet_lr: 1e-4 text_encoder _lr: 5e-5 lr_scheduler: constant lr_scheduler_num_cycles: 1 lr_scheduler_power: 1 train_batch_size: 8 num_epochs: 6 mixed_precision: bf16 save_precision bf16 save_n_epochs_type: save_every_n_epochs save_n_epochs_type_value: 1 resolution: 768 max_token_length: 225 clip_skip: 2 additional_argument: --xformers training_hardware: RTX 3090 training_time: ~1.5 hours (I don't remember exactly) #### Version 1.1 Improvements: **Better style consistency**: The model generates in a style closer to the Rhythm Heaven series much more consistently. 1.0 generated a bit more of a detailed style though so if that's what you want you should use that one. **Removed "rhythm_heaven" trigger**: Seems like a style trigger isn't really necessary, removing it just saves a bit of token length. **Less unprompted black and white generations**: This one isn't as big but I manually added color to some of the training images to get more variety which consequently means you'll get less black and white generations. ### Version 1 (SDXL) parameters: steps_per_image: 20 total_images: 122 (61 unique images, doubled amount by mirroring them) total_steps: 7320 training_model: anima_pencil-XL optimizer: Adafactor network_dim: 128 network_alpha: 1 network_train_on: both learning_rate: 1.2e-3 unet_lr: 1.2e-3 text_encoder _lr: 1.2e-3 lr_scheduler: constant lr_scheduler_num_cycles: 1 lr_scheduler_power: 1 train_batch_size: 5 num_epochs: 15 mixed_precision: bf16 save_precision bf16 save_n_epochs_type: save_every_n_epochs save_n_epochs_type_value: 1 resolution: 1024 max_token_length: 75 clip_skip: 2 additional_argument: --xformers training_hardware: RTX 3090 training_time: ~6 hours #### Version 1 (SDXL) Improvements: **Cleaner looking images**: All of the images used to train this model were upscaled 2x so outputs are less grainy. **Better prompt understanding**: SDXL has a better understanding of prompts so training a LoRA using it as a base makes the LoRA get a better understanding too. ## Model Description Trained on humanoid characters from the Rhythm Heaven series (and some from Wario Ware) using AnyLoRA. Captions were done manually using booru tags. - **Model type:** Standard LoRA - **Finetuned from model:** Stable Diffusion 1.5 based models ## Uses Used in conjunction with a booru based Stable Diffusion 1.5 model (ex. Any_LoRA) to emulate the style of the Rhythm_Heaven series. I recommend using it with a weight around 0.7 when prompting. Also, another reminder, this model was trained exclusively with booru tags so I'm not sure how well it'll work using blip captions.