# 1to2: Training Multiple-Subject Models using only Single-Subject Data (Experimental) Updates will be mirrored on both Hugging Face and Civitai. ## Introduction [It has been shown that multiple characters can be trained into the model](https://civitai.com/models/23476/the-idolmster-cinderella-girls-starlight-stage-style-90-characters). A harder task is to create a model that can generate multiple characters simultaneously without modifying the generation pipeline. This document describes a simple technique that has been shown to help generating multiple characters in the same image. ## Method ``` Requirement: Sets of single-character images Steps: 1. Train a multi-concept model using the original dataset 2. Create an augmentation dataset of joined image pairs from the original dataset 3. Train on the augmentation dataset ``` ## Experiment ### Setup 3 characters from the game Cinderella Girls are chosen for the experiment. The base model is `anime-final-pruned`. It has been checked that the base model has minimal knowledge of the trained characters. For the captions of the joined images, the template format `CharLeft/CharRight/COMPOSITE, TagsLeft, TagsRight` is used. A LoRA (Hadamard product) is trained using the config file below: ``` [model_arguments] v2 = false v_parameterization = false pretrained_model_name_or_path = "Animefull-final-pruned.ckpt" [additional_network_arguments] no_metadata = false unet_lr = 0.0005 text_encoder_lr = 0.0005 network_module = "lycoris.kohya" network_dim = 8 network_alpha = 1 network_args = [ "conv_dim=0", "conv_alpha=16", "algo=loha",] network_train_unet_only = false network_train_text_encoder_only = false [optimizer_arguments] optimizer_type = "AdamW8bit" learning_rate = 0.0005 max_grad_norm = 1.0 lr_scheduler = "cosine" lr_warmup_steps = 0 [dataset_arguments] debug_dataset = false # keep token 1 [training_arguments] output_name = "cg3comp" save_precision = "fp16" save_every_n_epochs = 1 train_batch_size = 2 max_token_length = 225 mem_eff_attn = false xformers = true max_train_epochs = 40 max_data_loader_n_workers = 8 persistent_data_loader_workers = true gradient_checkpointing = false gradient_accumulation_steps = 1 mixed_precision = "fp16" clip_skip = 2 lowram = true [sample_prompt_arguments] sample_every_n_epochs = 1 sample_sampler = "k_euler_a" [saving_arguments] save_model_as = "safetensors" ``` For the second stage of training, the batch size was reduced to 2 while keeping other settings identical. The training took less than 2 hours on a T4 GPU. ### Results (see preview images) ## Limitations * This technique doubles the memory/compute requirement * Composites can still be generated despite negative prompting * Cloned characters seem to become the primary failure mode in place of blended characters ## Related Works Models been trained on datasets based on anime shows have [demonstrated](https://civitai.com/models/21305/) multi-subject capabilty. Simply using concepts distant enough such as `1girl, 1boy` [has also been shown to be effective](https://civitai.com/models/17640/). ## Future work Below is a list of ideas yet to be explored * Synthetic datasets * Regularatization * Joint training instaed of sequential