Spaces:
Runtime error
Runtime error
| <!--Copyright 2024 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # InstructPix2Pix | |
| [InstructPix2Pix](https://huggingface.co/papers/2211.09800)λ text-conditioned diffusion λͺ¨λΈμ΄ ν μ΄λ―Έμ§μ νΈμ§μ λ°λ₯Ό μ μλλ‘ νμΈνλνλ λ°©λ²μ λλ€. μ΄ λ°©λ²μ μ¬μ©νμ¬ νμΈνλλ λͺ¨λΈμ λ€μμ μ λ ₯μΌλ‘ μ¬μ©ν©λλ€: | |
| <p align="center"> | |
| <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-instruction.png" alt="instructpix2pix-inputs" width=600/> | |
| </p> | |
| μΆλ ₯μ μ λ ₯ μ΄λ―Έμ§μ νΈμ§ μ§μκ° λ°μλ "μμ λ" μ΄λ―Έμ§μ λλ€: | |
| <p align="center"> | |
| <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/output-gs%407-igs%401-steps%4050.png" alt="instructpix2pix-output" width=600/> | |
| </p> | |
| `train_instruct_pix2pix.py` μ€ν¬λ¦½νΈ([μ¬κΈ°](https://github.com/huggingface/diffusers/blob/main/examples/instruct_pix2pix/train_instruct_pix2pix.py)μμ μ°Ύμ μ μμ΅λλ€.)λ νμ΅ μ μ°¨λ₯Ό μ€λͺ νκ³ Stable Diffusionμ μ μ©ν μ μλ λ°©λ²μ 보μ¬μ€λλ€. | |
| *** `train_instruct_pix2pix.py`λ [μλ ꡬν](https://github.com/timothybrooks/instruct-pix2pix)μ μΆ©μ€νλ©΄μ InstructPix2Pix νμ΅ μ μ°¨λ₯Ό ꡬννκ³ μμ§λ§, [μκ·λͺ¨ λ°μ΄ν°μ ](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples)μμλ§ ν μ€νΈλ₯Ό νμ΅λλ€. μ΄λ μ΅μ’ κ²°κ³Όμ μν₯μ λΌμΉ μ μμ΅λλ€. λ λμ κ²°κ³Όλ₯Ό μν΄, λ ν° λ°μ΄ν°μ μμ λ κΈΈκ² νμ΅νλ κ²μ κΆμ₯ν©λλ€. [μ¬κΈ°](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered)μμ InstructPix2Pix νμ΅μ μν΄ ν° λ°μ΄ν°μ μ μ°Ύμ μ μμ΅λλ€. | |
| *** | |
| ## PyTorchλ‘ λ‘컬μμ μ€ννκΈ° | |
| ### μ’ μμ±(dependencies) μ€μΉνκΈ° | |
| μ΄ μ€ν¬λ¦½νΈλ₯Ό μ€ννκΈ° μ μ, λΌμ΄λΈλ¬λ¦¬μ νμ΅ μ’ μμ±μ μ€μΉνμΈμ: | |
| **μ€μ** | |
| μ΅μ λ²μ μ μμ μ€ν¬λ¦½νΈλ₯Ό μ±κ³΅μ μΌλ‘ μ€ννκΈ° μν΄, **μλ³ΈμΌλ‘λΆν° μ€μΉ**νλ κ²κ³Ό μμ μ€ν¬λ¦½νΈλ₯Ό μμ£Ό μ λ°μ΄νΈνκ³ μμ λ³ μꡬμ¬νμ μ€μΉνκΈ° λλ¬Έμ μ΅μ μνλ‘ μ μ§νλ κ²μ κΆμ₯ν©λλ€. μ΄λ₯Ό μν΄, μλ‘μ΄ κ°μ νκ²½μμ λ€μ μ€ν μ μ€ννμΈμ: | |
| ```bash | |
| git clone https://github.com/huggingface/diffusers | |
| cd diffusers | |
| pip install -e . | |
| ``` | |
| cd λͺ λ Ήμ΄λ‘ μμ ν΄λλ‘ μ΄λνμΈμ. | |
| ```bash | |
| cd examples/instruct_pix2pix | |
| ``` | |
| μ΄μ μ€ννμΈμ. | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| κ·Έλ¦¬κ³ [π€Accelerate](https://github.com/huggingface/accelerate/) νκ²½μμ μ΄κΈ°ννμΈμ: | |
| ```bash | |
| accelerate config | |
| ``` | |
| νΉμ νκ²½μ λν μ§λ¬Έ μμ΄ κΈ°λ³Έμ μΈ accelerate ꡬμ±μ μ¬μ©νλ €λ©΄ λ€μμ μ€ννμΈμ. | |
| ```bash | |
| accelerate config default | |
| ``` | |
| νΉμ μ¬μ© μ€μΈ νκ²½μ΄ notebookκ³Ό κ°μ λνν μμ μ§μνμ§ μλ κ²½μ°λ λ€μ μ μ°¨λ₯Ό λ°λΌμ£ΌμΈμ. | |
| ```python | |
| from accelerate.utils import write_basic_config | |
| write_basic_config() | |
| ``` | |
| ### μμ | |
| μ΄μ μ μΈκΈνλ―μ΄, νμ΅μ μν΄ [μμ λ°μ΄ν°μ ](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples)μ μ¬μ©ν κ²μ λλ€. κ·Έ λ°μ΄ν°μ μ InstructPix2Pix λ Όλ¬Έμμ μ¬μ©λ [μλμ λ°μ΄ν°μ ](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered)λ³΄λ€ μμ λ²μ μ λλ€. μμ μ λ°μ΄ν°μ μ μ¬μ©νκΈ° μν΄, [νμ΅μ μν λ°μ΄ν°μ λ§λ€κΈ°](create_dataset) κ°μ΄λλ₯Ό μ°Έκ³ νμΈμ. | |
| `MODEL_NAME` νκ²½ λ³μ(νλΈ λͺ¨λΈ λ ν¬μ§ν 리 λλ λͺ¨λΈ κ°μ€μΉκ° ν¬ν¨λ ν΄λ κ²½λ‘)λ₯Ό μ§μ νκ³ [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) μΈμμ μ λ¬ν©λλ€. `DATASET_ID`μ λ°μ΄ν°μ μ΄λ¦μ μ§μ ν΄μΌ ν©λλ€: | |
| ```bash | |
| export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5" | |
| export DATASET_ID="fusing/instructpix2pix-1000-samples" | |
| ``` | |
| μ§κΈ, νμ΅μ μ€νν μ μμ΅λλ€. μ€ν¬λ¦½νΈλ λ ν¬μ§ν 리μ νμ ν΄λμ λͺ¨λ ꡬμ±μμ(`feature_extractor`, `scheduler`, `text_encoder`, `unet` λ±)λ₯Ό μ μ₯ν©λλ€. | |
| ```bash | |
| accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --dataset_name=$DATASET_ID \ | |
| --enable_xformers_memory_efficient_attention \ | |
| --resolution=256 --random_flip \ | |
| --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ | |
| --max_train_steps=15000 \ | |
| --checkpointing_steps=5000 --checkpoints_total_limit=1 \ | |
| --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ | |
| --conditioning_dropout_prob=0.05 \ | |
| --mixed_precision=fp16 \ | |
| --seed=42 \ | |
| --push_to_hub | |
| ``` | |
| μΆκ°μ μΌλ‘, κ°μ€μΉμ λ°μ΄μ΄μ€λ₯Ό νμ΅ κ³Όμ μ λͺ¨λν°λ§νμ¬ κ²μ¦ μΆλ‘ μ μννλ κ²μ μ§μν©λλ€. `report_to="wandb"`μ μ΄ κΈ°λ₯μ μ¬μ©ν μ μμ΅λλ€: | |
| ```bash | |
| accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --dataset_name=$DATASET_ID \ | |
| --enable_xformers_memory_efficient_attention \ | |
| --resolution=256 --random_flip \ | |
| --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ | |
| --max_train_steps=15000 \ | |
| --checkpointing_steps=5000 --checkpoints_total_limit=1 \ | |
| --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ | |
| --conditioning_dropout_prob=0.05 \ | |
| --mixed_precision=fp16 \ | |
| --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \ | |
| --validation_prompt="make the mountains snowy" \ | |
| --seed=42 \ | |
| --report_to=wandb \ | |
| --push_to_hub | |
| ``` | |
| λͺ¨λΈ λλ²κΉ μ μ μ©ν μ΄ νκ° λ°©λ² κΆμ₯ν©λλ€. μ΄λ₯Ό μ¬μ©νκΈ° μν΄ `wandb`λ₯Ό μ€μΉνλ κ²μ μ£Όλͺ©ν΄μ£ΌμΈμ. `pip install wandb`λ‘ μ€νν΄ `wandb`λ₯Ό μ€μΉν μ μμ΅λλ€. | |
| [μ¬κΈ°](https://wandb.ai/sayakpaul/instruct-pix2pix/runs/ctr3kovq), λͺ κ°μ§ νκ° λ°©λ²κ³Ό νμ΅ νλΌλ―Έν°λ₯Ό ν¬ν¨νλ μμλ₯Ό λ³Ό μ μμ΅λλ€. | |
| ***μ°Έκ³ : μλ³Έ λ Όλ¬Έμμ, μ μλ€μ 256x256 μ΄λ―Έμ§ ν΄μλλ‘ νμ΅ν λͺ¨λΈλ‘ 512x512μ κ°μ λ ν° ν΄μλλ‘ μ μΌλ°νλλ κ²μ λ³Ό μ μμμ΅λλ€. μ΄λ νμ΅μ μ¬μ©ν ν° λ°μ΄ν°μ μ μ¬μ©νκΈ° λλ¬Έμ λλ€.*** | |
| ## λ€μμ GPUλ‘ νμ΅νκΈ° | |
| `accelerate`λ μνν λ€μμ GPUλ‘ νμ΅μ κ°λ₯νκ² ν©λλ€. `accelerate`λ‘ λΆμ° νμ΅μ μ€ννλ [μ¬κΈ°](https://huggingface.co/docs/accelerate/basic_tutorials/launch) μ€λͺ μ λ°λΌ ν΄ μ£ΌμκΈ° λ°λλλ€. μμμ λͺ λ Ήμ΄ μ λλ€: | |
| ```bash | |
| accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \ | |
| --pretrained_model_name_or_path=stable-diffusion-v1-5/stable-diffusion-v1-5 \ | |
| --dataset_name=sayakpaul/instructpix2pix-1000-samples \ | |
| --use_ema \ | |
| --enable_xformers_memory_efficient_attention \ | |
| --resolution=512 --random_flip \ | |
| --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ | |
| --max_train_steps=15000 \ | |
| --checkpointing_steps=5000 --checkpoints_total_limit=1 \ | |
| --learning_rate=5e-05 --lr_warmup_steps=0 \ | |
| --conditioning_dropout_prob=0.05 \ | |
| --mixed_precision=fp16 \ | |
| --seed=42 \ | |
| --push_to_hub | |
| ``` | |
| ## μΆλ‘ νκΈ° | |
| μΌλ¨ νμ΅μ΄ μλ£λλ©΄, μΆλ‘ ν μ μμ΅λλ€: | |
| ```python | |
| import PIL | |
| import requests | |
| import torch | |
| from diffusers import StableDiffusionInstructPix2PixPipeline | |
| model_id = "your_model_id" # <- μ΄λ₯Ό μμ νμΈμ. | |
| pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") | |
| generator = torch.Generator("cuda").manual_seed(0) | |
| url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/test_pix2pix_4.png" | |
| def download_image(url): | |
| image = PIL.Image.open(requests.get(url, stream=True).raw) | |
| image = PIL.ImageOps.exif_transpose(image) | |
| image = image.convert("RGB") | |
| return image | |
| image = download_image(url) | |
| prompt = "wipe out the lake" | |
| num_inference_steps = 20 | |
| image_guidance_scale = 1.5 | |
| guidance_scale = 10 | |
| edited_image = pipe( | |
| prompt, | |
| image=image, | |
| num_inference_steps=num_inference_steps, | |
| image_guidance_scale=image_guidance_scale, | |
| guidance_scale=guidance_scale, | |
| generator=generator, | |
| ).images[0] | |
| edited_image.save("edited_image.png") | |
| ``` | |
| νμ΅ μ€ν¬λ¦½νΈλ₯Ό μ¬μ©ν΄ μ»μ μμμ λͺ¨λΈ λ ν¬μ§ν 리λ μ¬κΈ° [sayakpaul/instruct-pix2pix](https://huggingface.co/sayakpaul/instruct-pix2pix)μμ νμΈν μ μμ΅λλ€. | |
| μ±λ₯μ μν μλμ νμ§μ μ μ΄νκΈ° μν΄ μΈ κ°μ§ νλΌλ―Έν°λ₯Ό μ¬μ©νλ κ²μ΄ μ’μ΅λλ€: | |
| * `num_inference_steps` | |
| * `image_guidance_scale` | |
| * `guidance_scale` | |
| νΉν, `image_guidance_scale`μ `guidance_scale`λ μμ±λ("μμ λ") μ΄λ―Έμ§μμ ν° μν₯μ λ―ΈμΉ μ μμ΅λλ€.([μ¬κΈ°](https://twitter.com/RisingSayak/status/1628392199196151808?s=20)μμλ₯Ό μ°Έκ³ ν΄μ£ΌμΈμ.) | |
| λ§μ½ InstructPix2Pix νμ΅ λ°©λ²μ μ¬μ©ν΄ λͺ κ°μ§ ν₯λ―Έλ‘μ΄ λ°©λ²μ μ°Ύκ³ μλ€λ©΄, μ΄ λΈλ‘κ·Έ κ²μλ¬Ό[Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd)μ νμΈν΄μ£ΌμΈμ. |