svjack's picture
Upload 1392 files
43b7e92 verified
|
raw
history blame
17.8 kB
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
[[open-in-colab]]
# ํ›‘์–ด๋ณด๊ธฐ
Diffusion ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋‚˜ ์˜ค๋””์˜ค์™€ ๊ฐ™์€ ๊ด€์‹ฌ ์ƒ˜ํ”Œ๋“ค์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์ œ๊ฑฐํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ƒ์„ฑ AI์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๋งค์šฐ ๋†’์•„์กŒ์œผ๋ฉฐ, ์ธํ„ฐ๋„ท์—์„œ diffusion ์ƒ์„ฑ ์ด๋ฏธ์ง€์˜ ์˜ˆ๋ฅผ ๋ณธ ์ ์ด ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๐Ÿงจ Diffusers๋Š” ๋ˆ„๊ตฌ๋‚˜ diffusion ๋ชจ๋ธ๋“ค์„ ๋„๋ฆฌ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.
๊ฐœ๋ฐœ์ž๋“  ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž๋“  ์ด ํ›‘์–ด๋ณด๊ธฐ๋ฅผ ํ†ตํ•ด ๐Ÿงจ diffusers๋ฅผ ์†Œ๊ฐœํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค! ์•Œ์•„์•ผ ํ•  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค:
* [`DiffusionPipeline`]์€ ์ถ”๋ก ์„ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ diffusion ๋ชจ๋ธ์—์„œ ์ƒ˜ํ”Œ์„ ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•˜๋„๋ก ์„ค๊ณ„๋œ ๋†’์€ ์ˆ˜์ค€์˜ ์—”๋“œํˆฌ์—”๋“œ ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
* Diffusion ์‹œ์Šคํ…œ ์ƒ์„ฑ์„ ์œ„ํ•œ ๋นŒ๋”ฉ ๋ธ”๋ก์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ [model](./api/models) ์•„ํ‚คํ…์ฒ˜ ๋ฐ ๋ชจ๋“ˆ.
* ๋‹ค์–‘ํ•œ [schedulers](./api/schedulers/overview) - ํ•™์Šต์„ ์œ„ํ•ด ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก  ์ค‘์— ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์–ดํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” ์ถ”๋ก ์„ ์œ„ํ•ด [`DiffusionPipeline`]์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค€ ๋‹ค์Œ, ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ [`DiffusionPipeline`] ๋‚ด๋ถ€์—์„œ ์ผ์–ด๋‚˜๋Š” ์ผ์„ ๋ณต์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.
<Tip>
ํ›‘์–ด๋ณด๊ธฐ๋Š” ๊ฐ„๊ฒฐํ•œ ๋ฒ„์ „์˜ ๐Ÿงจ Diffusers ์†Œ๊ฐœ๋กœ์„œ [๋…ธํŠธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) ๋น ๋ฅด๊ฒŒ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋””ํ“จ์ €์˜ ๋ชฉํ‘œ, ๋””์ž์ธ ์ฒ ํ•™, ํ•ต์‹ฌ API์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด ๋…ธํŠธ๋ถ์„ ํ™•์ธํ•˜์„ธ์š”!
</Tip>
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:
```py
# ์ฃผ์„ ํ’€์–ด์„œ Colab์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•˜๊ธฐ.
#!pip install --upgrade diffusers accelerate transformers
```
- [๐Ÿค— Accelerate](https://huggingface.co/docs/accelerate/index)๋Š” ์ถ”๋ก  ๋ฐ ํ•™์Šต์„ ์œ„ํ•œ ๋ชจ๋ธ ๋กœ๋”ฉ ์†๋„๋ฅผ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
- [๐Ÿค— Transformers](https://huggingface.co/docs/transformers/index)๋Š” [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview)๊ณผ ๊ฐ™์ด ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” diffusion ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
## DiffusionPipeline
[`DiffusionPipeline`] ์€ ์ถ”๋ก ์„ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ diffusion ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํฌํ•จํ•˜๋Š” ์—”๋“œ ํˆฌ ์—”๋“œ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ž‘์—…์— [`DiffusionPipeline`]์„ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ํ‘œ์—์„œ ์ง€์›๋˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์ž‘์—…์„ ์‚ดํŽด๋ณด๊ณ , ์ง€์›๋˜๋Š” ์ž‘์—…์˜ ์ „์ฒด ๋ชฉ๋ก์€ [๐Ÿงจ Diffusers Summary](./api/pipelines/overview#diffusers-summary) ํ‘œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
| **Task** | **Description** | **Pipeline**
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
| Unconditional Image Generation | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
| Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) |
๋จผ์ € [`DiffusionPipeline`]์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์šด๋กœ๋“œํ•  ํŒŒ์ดํ”„๋ผ์ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์ €์žฅ๋œ ๋ชจ๋“  [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads)์— ๋Œ€ํ•ด [`DiffusionPipeline`]์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” text-to-image ์ƒ์„ฑ์„ ์œ„ํ•œ [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
<Tip warning={true}>
[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— [๋ผ์ด์„ ์Šค](https://huggingface.co/spaces/CompVis/stable-diffusion-license)๋ฅผ ๋จผ์ € ์ฃผ์˜ ๊นŠ๊ฒŒ ์ฝ์–ด์ฃผ์„ธ์š”. ๐Ÿงจ Diffusers๋Š” ๋ถˆ์พŒํ•˜๊ฑฐ๋‚˜ ์œ ํ•ดํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py)๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์žˆ์ง€๋งŒ, ๋ชจ๋ธ์˜ ํ–ฅ์ƒ๋œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ธฐ๋Šฅ์œผ๋กœ ์ธํ•ด ์—ฌ์ „ํžˆ ์ž ์žฌ์ ์œผ๋กœ ์œ ํ•ดํ•œ ์ฝ˜ํ…์ธ ๊ฐ€ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
</Tip>
[`~DiffusionPipeline.from_pretrained`] ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋ธ ๋กœ๋“œํ•˜๊ธฐ:
```python
>>> from diffusers import DiffusionPipeline
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
```
The [`DiffusionPipeline`]์€ ๋ชจ๋“  ๋ชจ๋ธ๋ง, ํ† ํฐํ™”, ์Šค์ผ€์ค„๋ง ์ปดํฌ๋„ŒํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹œํ•ฉ๋‹ˆ๋‹ค. Stable Diffusion Pipeline์€ ๋ฌด์—‡๋ณด๋‹ค๋„ [`UNet2DConditionModel`]๊ณผ [`PNDMScheduler`]๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> pipeline
StableDiffusionPipeline {
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.13.1",
...,
"scheduler": [
"diffusers",
"PNDMScheduler"
],
...,
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```
์ด ๋ชจ๋ธ์€ ์•ฝ 14์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ GPU์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ๊ฒƒ์„ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
PyTorch์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ GPU๋กœ ์ด๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
>>> pipeline.to("cuda")
```
์ด์ œ `ํŒŒ์ดํ”„๋ผ์ธ`์— ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ ๋‹ค์Œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ์€ [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) ๊ฐ์ฒด๋กœ ๊ฐ์‹ธ์ง‘๋‹ˆ๋‹ค.
```python
>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
>>> image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
</div>
`save`๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค:
```python
>>> image.save("image_of_squirrel_painting.png")
```
### ๋กœ์ปฌ ํŒŒ์ดํ”„๋ผ์ธ
ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ์ปฌ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ ์ผํ•œ ์ฐจ์ด์ ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋จผ์ € ๋‹ค์šด๋กœ๋“œํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค:
```bash
!git lfs install
!git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
```
๊ทธ๋Ÿฐ ๋‹ค์Œ ์ €์žฅ๋œ ๊ฐ€์ค‘์น˜๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์— ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:
```python
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
```
์ด์ œ ์œ„ ์„น์…˜์—์„œ์™€ ๊ฐ™์ด ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### ์Šค์ผ€์ค„๋Ÿฌ ๊ต์ฒด
์Šค์ผ€์ค„๋Ÿฌ๋งˆ๋‹ค ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ์†๋„์™€ ํ’ˆ์งˆ์ด ์„œ๋กœ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ž์‹ ์—๊ฒŒ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ฐพ๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ์ง์ ‘ ์‚ฌ์šฉํ•ด ๋ณด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๐Ÿงจ Diffusers์˜ ์ฃผ์š” ๊ธฐ๋Šฅ ์ค‘ ํ•˜๋‚˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ ๊ฐ„์— ์‰ฝ๊ฒŒ ์ „ํ™˜์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ธฐ๋ณธ ์Šค์ผ€์ค„๋Ÿฌ์ธ [`PNDMScheduler`]๋ฅผ [`EulerDiscreteScheduler`]๋กœ ๋ฐ”๊พธ๋ ค๋ฉด, [`~diffusers.ConfigMixin.from_config`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋“œํ•˜์„ธ์š”:
```py
>>> from diffusers import EulerDiscreteScheduler
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
```
์ƒˆ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด๋ณด๊ณ  ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”!
๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ผ๋Š” [`DiffusionPipeline`]์„ ๊ตฌ์„ฑํ•˜๋Š” ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ณ  ์ด๋Ÿฌํ•œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์–‘์ด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์›Œ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
## ๋ชจ๋ธ
๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ƒ˜ํ”Œ์„ ๊ฐ€์ ธ์™€ ๊ฐ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ๋งˆ๋‹ค ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ด๋ฏธ์ง€์™€ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ ์ฐจ์ด์ธ *๋…ธ์ด์ฆˆ ์ž”์ฐจ*(๋‹ค๋ฅธ ๋ชจ๋ธ์€ ์ด์ „ ์ƒ˜ํ”Œ์„ ์ง์ ‘ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜ ์†๋„ ๋˜๋Š” [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)์„ ์˜ˆ์ธกํ•˜๋Š” ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค)์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ๋ฏน์Šค ์•ค ๋งค์น˜ํ•˜์—ฌ ๋‹ค๋ฅธ diffusion ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ์€ [`~ModelMixin.from_pretrained`] ๋ฉ”์„œ๋“œ๋กœ ์‹œ์ž‘๋˜๋ฉฐ, ์ด ๋ฉ”์„œ๋“œ๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋กœ์ปฌ์— ์บ์‹œํ•˜์—ฌ ๋‹ค์Œ์— ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ๋•Œ ๋” ๋น ๋ฅด๊ฒŒ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” ๊ณ ์–‘์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•™์Šต๋œ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์žˆ๋Š” ๊ธฐ๋ณธ์ ์ธ unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์ธ [`UNet2DModel`]์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:
```py
>>> from diffusers import UNet2DModel
>>> repo_id = "google/ddpm-cat-256"
>>> model = UNet2DModel.from_pretrained(repo_id)
```
๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์— ์•ก์„ธ์Šคํ•˜๋ ค๋ฉด `model.config`๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค:
```py
>>> model.config
```
๋ชจ๋ธ ๊ตฌ์„ฑ์€ ๐ŸงŠ ๊ณ ์ •๋œ ๐ŸงŠ ๋”•์…”๋„ˆ๋ฆฌ๋กœ, ๋ชจ๋ธ์ด ์ƒ์„ฑ๋œ ํ›„์—๋Š” ํ•ด๋‹น ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์˜๋„์ ์ธ ๊ฒƒ์œผ๋กœ, ์ฒ˜์Œ์— ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ๋‹ค๋ฅธ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ถ”๋ก  ์ค‘์— ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
* `sample_size`: ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ๋†’์ด ๋ฐ ๋„ˆ๋น„ ์น˜์ˆ˜์ž…๋‹ˆ๋‹ค.
* `in_channels`: ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์ž…๋‹ˆ๋‹ค.
* `down_block_types` ๋ฐ `up_block_types`: UNet ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์šด ๋ฐ ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์œ ํ˜•.
* `block_out_channels`: ๋‹ค์šด์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜. ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์— ์—ญ์ˆœ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
* `layers_per_block`: ๊ฐ UNet ๋ธ”๋ก์— ์กด์žฌํ•˜๋Š” ResNet ๋ธ”๋ก์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
์ถ”๋ก ์— ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋กœ ์ด๋ฏธ์ง€ ๋ชจ์–‘์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฌด์ž‘์œ„ ๋…ธ์ด์ฆˆ๋ฅผ ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ 'batch' ์ถ•, ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์— ํ•ด๋‹นํ•˜๋Š” 'channel' ์ถ•, ์ด๋ฏธ์ง€์˜ ๋†’์ด์™€ ๋„ˆ๋น„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” 'sample_size' ์ถ•์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import torch
>>> torch.manual_seed(0)
>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])
```
์ถ”๋ก ์„ ์œ„ํ•ด ๋ชจ๋ธ์— ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ `timestep`์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. 'timestep'์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋” ๋งŽ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๊ณ  ๋ ๋ถ€๋ถ„์— ๋” ์ ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด diffusion ๊ณผ์ •์—์„œ ์‹œ์ž‘ ๋˜๋Š” ๋์— ๋” ๊ฐ€๊นŒ์šด ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. `sample` ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ์ถœ๋ ฅ์„ ์–ป์Šต๋‹ˆ๋‹ค:
```py
>>> with torch.no_grad():
... noisy_residual = model(sample=noisy_sample, timestep=2).sample
```
ํ•˜์ง€๋งŒ ์‹ค์ œ ์˜ˆ๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ์•ˆ๋‚ดํ•  ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ์„ ์Šค์ผ€์ค„๋Ÿฌ์™€ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.
## ์Šค์ผ€์ค„๋Ÿฌ
์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋ชจ๋ธ ์ถœ๋ ฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ์ƒ˜ํ”Œ์—์„œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ฒƒ์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค - ์ด ๊ฒฝ์šฐ 'noisy_residual'.
<Tip>
๐Ÿงจ Diffusers๋Š” Diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ํˆด๋ฐ•์Šค์ž…๋‹ˆ๋‹ค. [`DiffusionPipeline`]์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด์ง„ Diffusion ์‹œ์Šคํ…œ์„ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ง€์ • Diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
</Tip>
ํ›‘์–ด๋ณด๊ธฐ์˜ ๊ฒฝ์šฐ, [`~diffusers.ConfigMixin.from_config`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [`DDPMScheduler`]๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> from diffusers import DDPMScheduler
>>> scheduler = DDPMScheduler.from_config(repo_id)
>>> scheduler
DDPMScheduler {
"_class_name": "DDPMScheduler",
"_diffusers_version": "0.13.1",
"beta_end": 0.02,
"beta_schedule": "linear",
"beta_start": 0.0001,
"clip_sample": true,
"clip_sample_range": 1.0,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"trained_betas": null,
"variance_type": "fixed_small"
}
```
<Tip>
๐Ÿ’ก ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๊ตฌ์„ฑ์—์„œ ์–ด๋–ป๊ฒŒ ์ธ์Šคํ„ด์Šคํ™”๋˜๋Š”์ง€ ์ฃผ๋ชฉํ•˜์„ธ์š”. ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ์Šค์ผ€์ค„๋Ÿฌ์—๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์—†์œผ๋ฉฐ ๋งค๊ฐœ๋ณ€์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค!
</Tip>
๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
* `num_train_timesteps`: ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์˜ ๊ธธ์ด, ์ฆ‰ ๋žœ๋ค ๊ฐ€์šฐ์Šค ๋…ธ์ด์ฆˆ๋ฅผ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํƒ€์ž„์Šคํ… ์ˆ˜์ž…๋‹ˆ๋‹ค.
* `beta_schedule`: ์ถ”๋ก  ๋ฐ ํ•™์Šต์— ์‚ฌ์šฉํ•  ๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„ ์œ ํ˜•์ž…๋‹ˆ๋‹ค.
* `beta_start` ๋ฐ `beta_end`: ๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„์˜ ์‹œ์ž‘ ๋ฐ ์ข…๋ฃŒ ๋…ธ์ด์ฆˆ ๊ฐ’์ž…๋‹ˆ๋‹ค.
๋…ธ์ด์ฆˆ๊ฐ€ ์•ฝ๊ฐ„ ์ ์€ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋ ค๋ฉด ์Šค์ผ€์ค„๋Ÿฌ์˜ [`~diffusers.DDPMScheduler.step`] ๋ฉ”์„œ๋“œ์— ๋ชจ๋ธ ์ถœ๋ ฅ, `timestep`, ํ˜„์žฌ `sample`์„ ์ „๋‹ฌํ•˜์„ธ์š”.
```py
>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape
```
`less_noisy_sample`์„ ๋‹ค์Œ `timestep`์œผ๋กœ ๋„˜๊ธฐ๋ฉด ๋…ธ์ด์ฆˆ๊ฐ€ ๋” ์ค„์–ด๋“ญ๋‹ˆ๋‹ค! ์ด์ œ ์ด ๋ชจ๋“  ๊ฒƒ์„ ํ•œ๋ฐ ๋ชจ์•„ ์ „์ฒด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๊ณผ์ •์„ ์‹œ๊ฐํ™”ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
๋จผ์ € ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ›„์ฒ˜๋ฆฌํ•˜์—ฌ `PIL.Image`๋กœ ํ‘œ์‹œํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:
```py
>>> import PIL.Image
>>> import numpy as np
>>> def display_sample(sample, i):
... image_processed = sample.cpu().permute(0, 2, 3, 1)
... image_processed = (image_processed + 1.0) * 127.5
... image_processed = image_processed.numpy().astype(np.uint8)
... image_pil = PIL.Image.fromarray(image_processed[0])
... display(f"Image at step {i}")
... display(image_pil)
```
๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์˜ ์†๋„๋ฅผ ๋†’์ด๋ ค๋ฉด ์ž…๋ ฅ๊ณผ ๋ชจ๋ธ์„ GPU๋กœ ์˜ฎ๊ธฐ์„ธ์š”:
```py
>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")
```
์ด์ œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ์˜ ์ž”์ฐจ๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import tqdm
>>> sample = noisy_sample
>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
... # 1. predict noise residual
... with torch.no_grad():
... residual = model(sample, t).sample
... # 2. compute less noisy image and set x_t -> x_t-1
... sample = scheduler.step(residual, t, sample).prev_sample
... # 3. optionally look at image
... if (i + 1) % 50 == 0:
... display_sample(sample, i + 1)
```
๊ฐ€๋งŒํžˆ ์•‰์•„์„œ ๊ณ ์–‘์ด๊ฐ€ ์†Œ์Œ์œผ๋กœ๋งŒ ์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์„ ์ง€์ผœ๋ณด์„ธ์š”!๐Ÿ˜ป
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
</div>
## ๋‹ค์Œ ๋‹จ๊ณ„
์ด๋ฒˆ ํ›‘์–ด๋ณด๊ธฐ์—์„œ ๐Ÿงจ Diffusers๋กœ ๋ฉ‹์ง„ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด ๋ณด์…จ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค! ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ€์„ธ์š”:
* [training](./tutorials/basic_training) ํŠœํ† ๋ฆฌ์–ผ์—์„œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ ๋‚˜๋งŒ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๊ณต์‹ ๋ฐ ์ปค๋ฎค๋‹ˆํ‹ฐ [ํ•™์Šต ๋˜๋Š” ํŒŒ์ธํŠœ๋‹ ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) ์˜ˆ์‹œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
* ์Šค์ผ€์ค„๋Ÿฌ ๋กœ๋“œ, ์•ก์„ธ์Šค, ๋ณ€๊ฒฝ ๋ฐ ๋น„๊ต์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ ์‚ฌ์šฉ](./using-diffusers/schedulers) ๊ฐ€์ด๋“œ์—์„œ ํ™•์ธํ•˜์„ธ์š”.
* [Stable Diffusion](./stable_diffusion) ๊ฐ€์ด๋“œ์—์„œ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง, ์†๋„ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”, ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•œ ํŒ๊ณผ ์š”๋ น์„ ์‚ดํŽด๋ณด์„ธ์š”.
* [GPU์—์„œ ํŒŒ์ดํ† ์น˜ ์ตœ์ ํ™”](./optimization/fp16) ๊ฐ€์ด๋“œ์™€ [์• ํ”Œ ์‹ค๋ฆฌ์ฝ˜(M1/M2)์—์„œ์˜ Stable Diffusion](./optimization/mps) ๋ฐ [ONNX ๋Ÿฐํƒ€์ž„](./optimization/onnx) ์‹คํ–‰์— ๋Œ€ํ•œ ์ถ”๋ก  ๊ฐ€์ด๋“œ๋ฅผ ํ†ตํ•ด ๐Ÿงจ Diffuser ์†๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.