svjack's picture
Upload 1392 files
43b7e92 verified
|
raw
history blame
9.92 kB

InstructPix2Pix

InstructPix2PixλŠ” text-conditioned diffusion λͺ¨λΈμ΄ ν•œ 이미지에 νŽΈμ§‘μ„ λ”°λ₯Ό 수 μžˆλ„λ‘ νŒŒμΈνŠœλ‹ν•˜λŠ” λ°©λ²•μž…λ‹ˆλ‹€. 이 방법을 μ‚¬μš©ν•˜μ—¬ νŒŒμΈνŠœλ‹λœ λͺ¨λΈμ€ λ‹€μŒμ„ μž…λ ₯으둜 μ‚¬μš©ν•©λ‹ˆλ‹€:

instructpix2pix-inputs

좜λ ₯은 μž…λ ₯ 이미지에 νŽΈμ§‘ μ§€μ‹œκ°€ 반영된 "μˆ˜μ •λœ" μ΄λ―Έμ§€μž…λ‹ˆλ‹€:

instructpix2pix-output

train_instruct_pix2pix.py 슀크립트(μ—¬κΈ°μ—μ„œ 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.)λŠ” ν•™μŠ΅ 절차λ₯Ό μ„€λͺ…ν•˜κ³  Stable Diffusion에 μ μš©ν•  수 μžˆλŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€.

*** train_instruct_pix2pix.pyλŠ” μ›λž˜ κ΅¬ν˜„μ— μΆ©μ‹€ν•˜λ©΄μ„œ InstructPix2Pix ν•™μŠ΅ 절차λ₯Ό κ΅¬ν˜„ν•˜κ³  μžˆμ§€λ§Œ, μ†Œκ·œλͺ¨ λ°μ΄ν„°μ…‹μ—μ„œλ§Œ ν…ŒμŠ€νŠΈλ₯Ό ν–ˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” μ΅œμ’… 결과에 영ν–₯을 끼칠 수 μžˆμŠ΅λ‹ˆλ‹€. 더 λ‚˜μ€ κ²°κ³Όλ₯Ό μœ„ν•΄, 더 큰 λ°μ΄ν„°μ…‹μ—μ„œ 더 길게 ν•™μŠ΅ν•˜λŠ” 것을 ꢌμž₯ν•©λ‹ˆλ‹€. μ—¬κΈ°μ—μ„œ InstructPix2Pix ν•™μŠ΅μ„ μœ„ν•΄ 큰 데이터셋을 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.


PyTorch둜 λ‘œμ»¬μ—μ„œ μ‹€ν–‰ν•˜κΈ°

쒅속성(dependencies) μ„€μΉ˜ν•˜κΈ°

이 슀크립트λ₯Ό μ‹€ν–‰ν•˜κΈ° 전에, 라이브러리의 ν•™μŠ΅ 쒅속성을 μ„€μΉ˜ν•˜μ„Έμš”:

μ€‘μš”

μ΅œμ‹  λ²„μ „μ˜ 예제 슀크립트λ₯Ό μ„±κ³΅μ μœΌλ‘œ μ‹€ν–‰ν•˜κΈ° μœ„ν•΄, μ›λ³ΈμœΌλ‘œλΆ€ν„° μ„€μΉ˜ν•˜λŠ” 것과 예제 슀크립트λ₯Ό 자주 μ—…λ°μ΄νŠΈν•˜κ³  μ˜ˆμ œλ³„ μš”κ΅¬μ‚¬ν•­μ„ μ„€μΉ˜ν•˜κΈ° λ•Œλ¬Έμ— μ΅œμ‹  μƒνƒœλ‘œ μœ μ§€ν•˜λŠ” 것을 ꢌμž₯ν•©λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄, μƒˆλ‘œμš΄ 가상 ν™˜κ²½μ—μ„œ λ‹€μŒ μŠ€ν…μ„ μ‹€ν–‰ν•˜μ„Έμš”:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .

cd λͺ…λ Ήμ–΄λ‘œ 예제 ν΄λ”λ‘œ μ΄λ™ν•˜μ„Έμš”.

cd examples/instruct_pix2pix

이제 μ‹€ν–‰ν•˜μ„Έμš”.

pip install -r requirements.txt

그리고 πŸ€—Accelerate ν™˜κ²½μ—μ„œ μ΄ˆκΈ°ν™”ν•˜μ„Έμš”:

accelerate config

ν˜Ήμ€ ν™˜κ²½μ— λŒ€ν•œ 질문 없이 기본적인 accelerate ꡬ성을 μ‚¬μš©ν•˜λ €λ©΄ λ‹€μŒμ„ μ‹€ν–‰ν•˜μ„Έμš”.

accelerate config default

ν˜Ήμ€ μ‚¬μš© 쀑인 ν™˜κ²½μ΄ notebookκ³Ό 같은 λŒ€ν™”ν˜• μ‰˜μ€ μ§€μ›ν•˜μ§€ μ•ŠλŠ” κ²½μš°λŠ” λ‹€μŒ 절차λ₯Ό λ”°λΌμ£Όμ„Έμš”.

from accelerate.utils import write_basic_config

write_basic_config()

μ˜ˆμ‹œ

이전에 μ–ΈκΈ‰ν–ˆλ“―μ΄, ν•™μŠ΅μ„ μœ„ν•΄ μž‘μ€ 데이터셋을 μ‚¬μš©ν•  κ²ƒμž…λ‹ˆλ‹€. κ·Έ 데이터셋은 InstructPix2Pix λ…Όλ¬Έμ—μ„œ μ‚¬μš©λœ μ›λž˜μ˜ 데이터셋보닀 μž‘μ€ λ²„μ „μž…λ‹ˆλ‹€. μžμ‹ μ˜ 데이터셋을 μ‚¬μš©ν•˜κΈ° μœ„ν•΄, ν•™μŠ΅μ„ μœ„ν•œ 데이터셋 λ§Œλ“€κΈ° κ°€μ΄λ“œλ₯Ό μ°Έκ³ ν•˜μ„Έμš”.

MODEL_NAME ν™˜κ²½ λ³€μˆ˜(ν—ˆλΈŒ λͺ¨λΈ λ ˆν¬μ§€ν† λ¦¬ λ˜λŠ” λͺ¨λΈ κ°€μ€‘μΉ˜κ°€ ν¬ν•¨λœ 폴더 경둜)λ₯Ό μ§€μ •ν•˜κ³  pretrained_model_name_or_path μΈμˆ˜μ— μ „λ‹¬ν•©λ‹ˆλ‹€. DATASET_ID에 데이터셋 이름을 지정해야 ν•©λ‹ˆλ‹€:

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_ID="fusing/instructpix2pix-1000-samples"

μ§€κΈˆ, ν•™μŠ΅μ„ μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μŠ€ν¬λ¦½νŠΈλŠ” λ ˆν¬μ§€ν† λ¦¬μ˜ ν•˜μœ„ ν΄λ”μ˜ λͺ¨λ“  κ΅¬μ„±μš”μ†Œ(feature_extractor, scheduler, text_encoder, unet λ“±)λ₯Ό μ €μž₯ν•©λ‹ˆλ‹€.

accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --dataset_name=$DATASET_ID \
    --enable_xformers_memory_efficient_attention \
    --resolution=256 --random_flip \
    --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
    --max_train_steps=15000 \
    --checkpointing_steps=5000 --checkpoints_total_limit=1 \
    --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \
    --conditioning_dropout_prob=0.05 \
    --mixed_precision=fp16 \
    --seed=42 \
    --push_to_hub

μΆ”κ°€μ μœΌλ‘œ, κ°€μ€‘μΉ˜μ™€ λ°”μ΄μ–΄μŠ€λ₯Ό ν•™μŠ΅ 과정에 λͺ¨λ‹ˆν„°λ§ν•˜μ—¬ 검증 좔둠을 μˆ˜ν–‰ν•˜λŠ” 것을 μ§€μ›ν•©λ‹ˆλ‹€. report_to="wandb"와 이 κΈ°λŠ₯을 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --dataset_name=$DATASET_ID \
    --enable_xformers_memory_efficient_attention \
    --resolution=256 --random_flip \
    --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
    --max_train_steps=15000 \
    --checkpointing_steps=5000 --checkpoints_total_limit=1 \
    --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \
    --conditioning_dropout_prob=0.05 \
    --mixed_precision=fp16 \
    --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
    --validation_prompt="make the mountains snowy" \
    --seed=42 \
    --report_to=wandb \
    --push_to_hub

λͺ¨λΈ 디버깅에 μœ μš©ν•œ 이 평가 방법 ꢌμž₯ν•©λ‹ˆλ‹€. 이λ₯Ό μ‚¬μš©ν•˜κΈ° μœ„ν•΄ wandbλ₯Ό μ„€μΉ˜ν•˜λŠ” 것을 μ£Όλͺ©ν•΄μ£Όμ„Έμš”. pip install wandb둜 μ‹€ν–‰ν•΄ wandbλ₯Ό μ„€μΉ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

μ—¬κΈ°, λͺ‡ 가지 평가 방법과 ν•™μŠ΅ νŒŒλΌλ―Έν„°λ₯Ό ν¬ν•¨ν•˜λŠ” μ˜ˆμ‹œλ₯Ό λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

μ°Έκ³ : 원본 λ…Όλ¬Έμ—μ„œ, μ €μžλ“€μ€ 256x256 이미지 ν•΄μƒλ„λ‘œ ν•™μŠ΅ν•œ λͺ¨λΈλ‘œ 512x512와 같은 더 큰 ν•΄μƒλ„λ‘œ 잘 μΌλ°˜ν™”λ˜λŠ” 것을 λ³Ό 수 μžˆμ—ˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” ν•™μŠ΅μ— μ‚¬μš©ν•œ 큰 데이터셋을 μ‚¬μš©ν–ˆκΈ° λ•Œλ¬Έμž…λ‹ˆλ‹€.

λ‹€μˆ˜μ˜ GPU둜 ν•™μŠ΅ν•˜κΈ°

accelerateλŠ” μ›ν™œν•œ λ‹€μˆ˜μ˜ GPU둜 ν•™μŠ΅μ„ κ°€λŠ₯ν•˜κ²Œ ν•©λ‹ˆλ‹€. accelerate둜 λΆ„μ‚° ν•™μŠ΅μ„ μ‹€ν–‰ν•˜λŠ” μ—¬κΈ° μ„€λͺ…을 따라 ν•΄ μ£Όμ‹œκΈ° λ°”λžλ‹ˆλ‹€. μ˜ˆμ‹œμ˜ λͺ…λ Ήμ–΄ μž…λ‹ˆλ‹€:

accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
 --pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
 --dataset_name=sayakpaul/instructpix2pix-1000-samples \
 --use_ema \
 --enable_xformers_memory_efficient_attention \
 --resolution=512 --random_flip \
 --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
 --max_train_steps=15000 \
 --checkpointing_steps=5000 --checkpoints_total_limit=1 \
 --learning_rate=5e-05 --lr_warmup_steps=0 \
 --conditioning_dropout_prob=0.05 \
 --mixed_precision=fp16 \
 --seed=42 \
 --push_to_hub

μΆ”λ‘ ν•˜κΈ°

일단 ν•™μŠ΅μ΄ μ™„λ£Œλ˜λ©΄, μΆ”λ‘  ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "your_model_id"  # <- 이λ₯Ό μˆ˜μ •ν•˜μ„Έμš”.
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
generator = torch.Generator("cuda").manual_seed(0)

url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/test_pix2pix_4.png"


def download_image(url):
   image = PIL.Image.open(requests.get(url, stream=True).raw)
   image = PIL.ImageOps.exif_transpose(image)
   image = image.convert("RGB")
   return image


image = download_image(url)
prompt = "wipe out the lake"
num_inference_steps = 20
image_guidance_scale = 1.5
guidance_scale = 10

edited_image = pipe(
   prompt,
   image=image,
   num_inference_steps=num_inference_steps,
   image_guidance_scale=image_guidance_scale,
   guidance_scale=guidance_scale,
   generator=generator,
).images[0]
edited_image.save("edited_image.png")

ν•™μŠ΅ 슀크립트λ₯Ό μ‚¬μš©ν•΄ 얻은 μ˜ˆμ‹œμ˜ λͺ¨λΈ λ ˆν¬μ§€ν† λ¦¬λŠ” μ—¬κΈ° sayakpaul/instruct-pix2pixμ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.

μ„±λŠ₯을 μœ„ν•œ 속도와 ν’ˆμ§ˆμ„ μ œμ–΄ν•˜κΈ° μœ„ν•΄ μ„Έ 가지 νŒŒλΌλ―Έν„°λ₯Ό μ‚¬μš©ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€:

  • num_inference_steps
  • image_guidance_scale
  • guidance_scale

특히, image_guidance_scale와 guidance_scaleλŠ” μƒμ„±λœ("μˆ˜μ •λœ") μ΄λ―Έμ§€μ—μ„œ 큰 영ν–₯을 λ―ΈμΉ  수 μžˆμŠ΅λ‹ˆλ‹€.(μ—¬κΈ°μ˜ˆμ‹œλ₯Ό μ°Έκ³ ν•΄μ£Όμ„Έμš”.)

λ§Œμ•½ InstructPix2Pix ν•™μŠ΅ 방법을 μ‚¬μš©ν•΄ λͺ‡ 가지 ν₯미둜운 방법을 μ°Ύκ³  μžˆλ‹€λ©΄, 이 λΈ”λ‘œκ·Έ κ²Œμ‹œλ¬ΌInstruction-tuning Stable Diffusion with InstructPix2Pix을 ν™•μΈν•΄μ£Όμ„Έμš”.