LoRA

#12

by jffacevedo - opened Oct 27, 2023

Discussion

jffacevedo

Oct 27, 2023

Thanks for this model, it works great!

Can we create LoRAs with this model? Meaning, will the diffusers LoRA training code work with this model?

Warlord-K

Segmind org Oct 27, 2023

Yes, All the Diffusers Training scripts are fully supported with SSD-1B!

tintwotin

Oct 27, 2023

@jffacevedo If you create a LoRA for this model, please share it, so I can test if the Diffusers LoRA loading code will support loading LoRAs on SSD-1B
(I'm on 6 GB VRAM and LoRA training is not possible for me)

jffacevedo

Oct 27, 2023

Thank you, I"ll give it a try and share the results.

Warlord-K

Segmind org Nov 6, 2023

@jffacevedo Were you able to train the LoRA successfully?

jffacevedo

Nov 7, 2023

@Warlord-K I was able to train successfully, but the validation step of the script failed with RuntimeError: Input type (c10::Half) and bias type (float) should be the same. It still saved the checkpoints, here is after 2 epochs.

With LoRA

Without LoRA

See the full logs below:

accelerate launch train_
text_to_image_lora_sdxl.py   --pretrained_model_name_or_path=$MODEL_NAME  --dataset_name=$DATASET_NAME 
--caption_column="text"   --resolution=1024 --random_flip   --train_batch_size=1   --num_train_epochs=2
 --checkpointing_steps=500   --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0   --mi
xed_precision="fp16"   --seed=42   --output_dir="sd-pokemon-model-lora-sdxl"   --validation_prompt="cut
e dragon creature"
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
11/07/2023 01:01:10 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'variance_type', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
Downloading model.safetensors: 100%|█████████████████████████████████| 492M/492M [00:02<00:00, 220MB/s]
Downloading model.safetensors: 100%|██████████████████████████████| 2.78G/2.78G [01:10<00:00, 39.7MB/s]
Downloading (…)ch_model.safetensors: 100%|███████████████████████████████████████████████████████████████| 335M/335M [00:01<00:00, 200MB/s]
Downloading (…)ch_model.safetensors: 100%|█████████████████████████████████████████████████████████████| 5.33G/5.33G [00:23<00:00, 230MB/s]
{'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
Downloading readme: 100%|█████████████████████████████████████████████████████████████████████████████| 1.80k/1.80k [00:00<00:00, 11.8MB/s]
Downloading metadata: 100%|███████████████████████████████████████████████████████████████████████████████| 731/731 [00:00<00:00, 5.92MB/s]
Downloading data: 100%|███████████████████████████████████████████████████████████████████████████████| 99.7M/99.7M [00:02<00:00, 39.7MB/s]
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.51s/it]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1508.74it/s]
Generating train split: 100%|███████████████████████████████████████████████████████████████████| 833/833 [00:00<00:00, 2920.80 examples/s]
11/07/2023 01:03:08 - INFO - __main__ - ***** Running training *****
11/07/2023 01:03:08 - INFO - __main__ -   Num examples = 833
11/07/2023 01:03:08 - INFO - __main__ -   Num Epochs = 2
11/07/2023 01:03:08 - INFO - __main__ -   Instantaneous batch size per device = 1
11/07/2023 01:03:08 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
11/07/2023 01:03:08 - INFO - __main__ -   Gradient Accumulation steps = 1
11/07/2023 01:03:08 - INFO - __main__ -   Total optimization steps = 1666
Steps:  30%|███████████████████████                                                      | 500/1666 [08:46<20:27,  1.05s/it, lr=0.0001, step_loss=0.00503]11/07/2023 01:11:55 - INFO - accelerate.accelerator - Saving current state to sd-pokemon-model-lora-sdxl/checkpoint-500
Model weights saved in sd-pokemon-model-lora-sdxl/checkpoint-500/pytorch_lora_weights.safetensors
11/07/2023 01:11:55 - INFO - accelerate.checkpointing - Optimizer state saved in sd-pokemon-model-lora-sdxl/checkpoint-500/optimizer.bin
11/07/2023 01:11:55 - INFO - accelerate.checkpointing - Scheduler state saved in sd-pokemon-model-lora-sdxl/checkpoint-500/scheduler.bin
11/07/2023 01:11:55 - INFO - accelerate.checkpointing - Gradient scaler state saved in sd-pokemon-model-lora-sdxl/checkpoint-500/scaler.pt
11/07/2023 01:11:55 - INFO - accelerate.checkpointing - Random states saved in sd-pokemon-model-lora-sdxl/checkpoint-500/random_states_0.pkl
11/07/2023 01:11:55 - INFO - __main__ - Saved state to sd-pokemon-model-lora-sdxl/checkpoint-500
Steps:  50%|██████████████████████████████████████▌                                      | 833/1666 [14:36<14:27,  1.04s/it, lr=0.0001, step_loss=0.00567]11/07/2023 01:17:45 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: cute dragon creature.
{'add_watermarker'} was not found in config. Values will be initialized to default values.
                                                                                                                                                         Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of segmind/SSD-1B.                                             | 0/7 [00:00<?, ?it/s]
Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of segmind/SSD-1B.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of segmind/SSD-1B.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 53.80it/s]
Steps:  60%|██████████████████████████████████████████████▊                               | 1000/1666 [18:46<11:39,  1.05s/it, lr=0.0001, step_loss=0.105]11/07/2023 01:21:55 - INFO - accelerate.accelerator - Saving current state to sd-pokemon-model-lora-sdxl/checkpoint-1000
Model weights saved in sd-pokemon-model-lora-sdxl/checkpoint-1000/pytorch_lora_weights.safetensors
11/07/2023 01:21:55 - INFO - accelerate.checkpointing - Optimizer state saved in sd-pokemon-model-lora-sdxl/checkpoint-1000/optimizer.bin
11/07/2023 01:21:55 - INFO - accelerate.checkpointing - Scheduler state saved in sd-pokemon-model-lora-sdxl/checkpoint-1000/scheduler.bin
11/07/2023 01:21:55 - INFO - accelerate.checkpointing - Gradient scaler state saved in sd-pokemon-model-lora-sdxl/checkpoint-1000/scaler.pt
11/07/2023 01:21:55 - INFO - accelerate.checkpointing - Random states saved in sd-pokemon-model-lora-sdxl/checkpoint-1000/random_states_0.pkl
11/07/2023 01:21:55 - INFO - __main__ - Saved state to sd-pokemon-model-lora-sdxl/checkpoint-1000
Steps:  90%|█████████████████████████████████████████████████████████████████████▎       | 1500/1666 [27:31<02:53,  1.05s/it, lr=0.0001, step_loss=0.0159]11/07/2023 01:30:40 - INFO - accelerate.accelerator - Saving current state to sd-pokemon-model-lora-sdxl/checkpoint-1500
Model weights saved in sd-pokemon-model-lora-sdxl/checkpoint-1500/pytorch_lora_weights.safetensors
11/07/2023 01:30:40 - INFO - accelerate.checkpointing - Optimizer state saved in sd-pokemon-model-lora-sdxl/checkpoint-1500/optimizer.bin
11/07/2023 01:30:40 - INFO - accelerate.checkpointing - Scheduler state saved in sd-pokemon-model-lora-sdxl/checkpoint-1500/scheduler.bin
11/07/2023 01:30:40 - INFO - accelerate.checkpointing - Gradient scaler state saved in sd-pokemon-model-lora-sdxl/checkpoint-1500/scaler.pt
11/07/2023 01:30:40 - INFO - accelerate.checkpointing - Random states saved in sd-pokemon-model-lora-sdxl/checkpoint-1500/random_states_0.pkl
11/07/2023 01:30:40 - INFO - __main__ - Saved state to sd-pokemon-model-lora-sdxl/checkpoint-1500
Steps: 100%|██████████████████████████████████████████████████████████████████████████████| 1666/1666 [30:26<00:00,  1.04s/it, lr=0.0001, step_loss=0.126]11/07/2023 01:33:35 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: cute dragon creature.
{'add_watermarker'} was not found in config. Values will be initialized to default values.
                                                                                                                                                         Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of segmind/SSD-1B.                                             | 0/7 [00:00<?, ?it/s]
Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of segmind/SSD-1B.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of segmind/SSD-1B.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 60.59it/s]
Model weights saved in sd-pokemon-model-lora-sdxl/pytorch_lora_weights.safetensors█████████████████████████████▌            | 6/7 [00:00<00:00, 52.13it/s]
{'add_watermarker'} was not found in config. Values will be initialized to default values.
                                                                                                                                                         Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of segmind/SSD-1B.                                             | 0/7 [00:00<?, ?it/s]
Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of segmind/SSD-1B.
                                                                                                                                                         Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of segmind/SSD-1B.                                          | 3/7 [00:00<00:00,  9.52it/s]
{'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
Loaded unet as UNet2DConditionModel from `unet` subfolder of segmind/SSD-1B.
                                                                                                                                                         Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of segmind/SSD-1B.█████████████████████▏                        | 5/7 [00:02<00:01,  1.89it/s]
Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of segmind/SSD-1B.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00,  1.93it/s]
Loading unet.ine components...: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00,  1.69it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:06<00:00,  3.72it/s]
Traceback (most recent call last):████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:06<00:00,  3.71it/s]
  File "/home/jfacevedo_google_com/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 1265, in <module>
    main(args)
  File "/home/jfacevedo_google_com/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 1224, in main
    images = [
  File "/home/jfacevedo_google_com/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 1225, in <listcomp>
    pipeline(args.validation_prompt, num_inference_steps=25, generator=generator).images[0]
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 1057, in __call__
    image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 316, in decode
    decoded = self._decode(z).sample
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 288, in _decode
    z = self.post_quant_conv(z)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
Steps: 100%|██████████████████████████████████████████████████████████████████████████████| 1666/1666 [31:52<00:00,  1.15s/it, lr=0.0001, step_loss=0.126]
Traceback (most recent call last):
  File "/opt/conda/envs/sdxl/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/opt/conda/envs/sdxl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/sdxl/bin/python3.10', 'train_text_to_image_lora_sdxl.py', '--pretrained_model_name_or_path=segmind/SSD-1B', '--dataset_name=lambdalabs/pokemon-blip-captions', '--caption_column=text', '--resolution=1024', '--random_flip', '--train_batch_size=1', '--num_train_epochs=2', '--checkpointing_steps=500', '--learning_rate=1e-04', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--mixed_precision=fp16', '--seed=42', '--output_dir=sd-pokemon-model-lora-sdxl', '--validation_prompt=cute dragon creature']' returned non-zero exit status 1.

Warlord-K

Segmind org Nov 7, 2023

Something might have gone wrong in the training, We'll try the same and get back to you, Thanks for reporting!

inb4devops

Nov 8, 2023

•

edited Nov 8, 2023

I also tried to train a LoRA for SSD-1B but I'm getting this error (Missing key(s) in state_dict): https://github.com/bmaltais/kohya_ss/issues/1665

Is this related to kohya_ss or the model?

Icar

Segmind org Nov 8, 2023

•

edited Nov 8, 2023

Have you tried updating Kohya? It seems like it didn't recognize the model and expects a larger state_dict.

inb4devops

Nov 8, 2023

Have you tried updating Kohya? It seems like it didn't recognize the model and expects a larger state_dict.

yeah, I just rechecked and my clone is up to date.

huzaif-ahmed

Nov 22, 2023

how can i train this model for multiple images of different people example a person named has his all pics in p1 folder ,and a person named p2 has all his pics in p2 how do i do it now i want my model to give the accurate pics when i say p1 and p2

Warlord-K

Segmind org Nov 22, 2023

You can use the dreambooth lora training script available in diffusers

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment