Open-Source AI Cookbook documentation

๐Ÿค—transformers, ๐Ÿค—datasets, FAISS๋ฅผ ์‚ฌ์šฉํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ์ž„๋ฒ ๋”ฉ ๋ฐ ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Open In Colab

๐Ÿค—transformers, ๐Ÿค—datasets, FAISS๋ฅผ ์‚ฌ์šฉํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ์ž„๋ฒ ๋”ฉ ๋ฐ ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰

์ž‘์„ฑ์ž: Merve Noyan, ์ด์ •์ธ

์ž„๋ฒ ๋”ฉ์€ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์ค‘์š”ํ•œ ์ •๋ณด์˜ ์••์ถ•์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰, ์ œ๋กœ์ƒท ๋ถ„๋ฅ˜ ๋˜๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰์˜ ํ™œ์šฉ ์‚ฌ๋ก€๋กœ๋Š” ์ „์ž์ƒ๊ฑฐ๋ž˜์—์„œ ์œ ์‚ฌํ•œ ์ œํ’ˆ ๊ฒ€์ƒ‰, ์†Œ์…œ ๋ฏธ๋””์–ด์—์„œ์˜ ์ฝ˜ํ…์ธ  ๊ฒ€์ƒ‰ ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋…ธํŠธ๋ถ์€ ๐Ÿค—Transformers, ๐Ÿค—Datasets ๋ฐ FAISS๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ง• ์ถ”์ถœ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๊ณ  ์ธ๋ฑ์‹ฑํ•˜์—ฌ ์ดํ›„ ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰์— ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ด๋ด…์‹œ๋‹ค.

!pip install -q datasets faiss-gpu transformers sentencepiece

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” CLIP ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ง•์„ ์ถ”์ถœํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. CLIP์€ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ํ•จ๊ป˜ ํ•™์Šต์‹œ์ผœ ๋‘ ๊ฐ€์ง€ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ํ˜์‹ ์ ์ธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModel, AutoTokenizer
import faiss
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = AutoModel.from_pretrained("openai/clip-vit-base-patch16").to(device)
processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch16")
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-base-patch16")

๋ฐ์ดํ„ฐ์…‹์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€๋ณ๊ฒŒ ์ด ์˜ˆ์ œ๋ฅผ ํ•ด ๋ณด๊ธฐ ์œ„ํ•ด, ์ž‘์€ ์บก์…˜ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด๋ด…์‹œ๋‹ค, jmhessel/newyorker_caption_contest.

from datasets import load_dataset

ds = load_dataset("jmhessel/newyorker_caption_contest", "explanation")

์˜ˆ์ œ๋ฅผ ํ•˜๋‚˜ ๋ด…์‹œ๋‹ค.

>>> ds["train"][0]["image"]
ds["train"][0]["image_description"]

์šฐ๋ฆฌ๋Š” ์˜ˆ์ œ๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๊ฑฐ๋‚˜ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋–ค ํ•จ์ˆ˜๋„ ์ž‘์„ฑํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๐Ÿค—Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ FAISS ํ†ตํ•ฉ์ด ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ์ถ”์ƒํ™”ํ•ด์ค๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฐ์ดํ„ฐ์…‹์˜ map ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์˜ˆ์ œ์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์„ ํฌํ•จํ•˜๋Š” ์ƒˆ๋กœ์šด ์—ด์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ํ”„๋กฌํ”„ํŠธ ์—ด์—์„œ ํ…์ŠคํŠธ ํŠน์ง•์„ ์œ„ํ•œ ์ž„๋ฒ ๋”ฉ์„ ๋งŒ๋“ค์–ด๋ด…์‹œ๋‹ค.

dataset = ds["train"]
ds_with_embeddings = dataset.map(
    lambda example: {
        "embeddings": model.get_text_features(
            **tokenizer([example["image_description"]], truncation=True, return_tensors="pt").to("cuda")
        )[0]
        .detach()
        .cpu()
        .numpy()
    }
)

๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ๋„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ds_with_embeddings = ds_with_embeddings.map(
    lambda example: {
        "image_embeddings": model.get_image_features(**processor([example["image"]], return_tensors="pt").to("cuda"))[
            0
        ]
        .detach()
        .cpu()
        .numpy()
    }
)

์ด์ œ ์šฐ๋ฆฌ๋Š” ๊ฐ ์—ด์— ๋Œ€ํ•œ ์ธ๋ฑ์Šค๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

# ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ์„ ์œ„ํ•œ FAISS ์ธ๋ฑ์Šค๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
ds_with_embeddings.add_faiss_index(column="embeddings")
# ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์œ„ํ•œ FAISS ์ธ๋ฑ์Šค๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
ds_with_embeddings.add_faiss_index(column="image_embeddings")

ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋กœ ๋ฐ์ดํ„ฐ ์งˆ๋ฌธํ•˜๊ธฐ

์ด์ œ ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹ ์งˆ๋ฌธ์„ ๋˜์ง€๊ณ , ์œ ์‚ฌํ•œ ํ•ญ๋ชฉ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

prmt = "a snowy day"
prmt_embedding = (
    model.get_text_features(**tokenizer([prmt], return_tensors="pt", truncation=True).to("cuda"))[0]
    .detach()
    .cpu()
    .numpy()
)
scores, retrieved_examples = ds_with_embeddings.get_nearest_examples("embeddings", prmt_embedding, k=1)
>>> def downscale_images(image):
...     width = 200
...     ratio = width / float(image.size[0])
...     height = int((float(image.size[1]) * float(ratio)))
...     img = image.resize((width, height), Image.Resampling.LANCZOS)
...     return img


>>> images = [downscale_images(image) for image in retrieved_examples["image"]]
>>> # ์œ ์‚ฌํ•œ ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
>>> print(retrieved_examples["image_description"])
>>> display(images[0])
['A man is in the snow. A boy with a huge snow shovel is there too. They are outside a house.']

์ด๋ฏธ์ง€ ํ”„๋กฌํ”„ํŠธ๋กœ ๋ฐ์ดํ„ฐ ์งˆ๋ฌธํ•˜๊ธฐ

์ด๋ฏธ์ง€ ์œ ์‚ฌ์„ฑ ์ถ”๋ก ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, get_image_features๋ฅผ ํ˜ธ์ถœํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

>>> import requests

>>> # image of a beaver
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/beaver.png"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> display(downscale_images(image))

์ด ๋น„๋ฒ„ ์ด๋ฏธ์ง€์™€ ๋น„์Šทํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ ํ•ด ๋ด…์‹œ๋‹ค.

img_embedding = (
    model.get_image_features(**processor([image], return_tensors="pt", truncation=True).to("cuda"))[0]
    .detach()
    .cpu()
    .numpy()
)
scores, retrieved_examples = ds_with_embeddings.get_nearest_examples("image_embeddings", img_embedding, k=1)

๋น„๋ฒ„ ์ด๋ฏธ์ง€์™€ ๊ฐ€์žฅ ๋น„์Šทํ•œ ์ด๋ฏธ์ง€๊ฐ€ ํ™”๋ฉด์— ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

>>> images = [downscale_images(image) for image in retrieved_examples["image"]]
>>> # ์œ ์‚ฌํ•œ ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
>>> print(retrieved_examples["image_description"])
>>> display(images[0])
['Salmon swim upstream but they see a grizzly bear and are in shock. The bear has a smug look on his face when he sees the salmon.']

์ž„๋ฒ ๋”ฉ์„ ์ €์žฅํ•˜๊ณ , ์˜ฌ๋ฆฌ๊ณ , ๊ฐ€์ ธ์˜ค๊ธฐ

์ž„๋ฒ ๋”ฉ์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹์„ save_faiss_index๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ds_with_embeddings.save_faiss_index("embeddings", "embeddings/embeddings.faiss")
ds_with_embeddings.save_faiss_index("image_embeddings", "embeddings/image_embeddings.faiss")

์ž„๋ฒ ๋”ฉ์„ ๋ฐ์ดํ„ฐ์…‹ ์ €์žฅ์†Œ์— ์ €์žฅํ•˜๋Š” ๊ฒƒ์€ ์ข‹์€ ์Šต๊ด€์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” Hugging Face Hub์— ๋กœ๊ทธ์ธํ•˜๊ณ , ๋ฐ์ดํ„ฐ์…‹ ์ €์žฅ์†Œ๋ฅผ ์ƒ์„ฑํ•œ ํ›„, ๊ทธ๊ณณ์— ์ž„๋ฒ ๋”ฉ ์ธ๋ฑ์Šค๋ฅผ ์˜ฌ๋ฆด ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ดํ›„์—๋Š” snapshot_download๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๋‹น ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from huggingface_hub import HfApi, notebook_login, snapshot_download

notebook_login()
from huggingface_hub import HfApi

hf_id = "๋‹น์‹ ์˜ ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ ์•„์ด๋””๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š”."

api = HfApi()
api.create_repo(f"{hf_id}/faiss_embeddings", repo_type="dataset")
api.upload_folder(
    folder_path="./embeddings",
    repo_id=f"{hf_id}/faiss_embeddings",
    repo_type="dataset",
)
snapshot_download(repo_id=f"{hf_id}/faiss_embeddings", repo_type="dataset", local_dir="downloaded_embeddings")

load_faiss_index๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์…‹์— ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ds = ds["train"]
ds.load_faiss_index("embeddings", "./downloaded_embeddings/embeddings.faiss")
# ๋‹ค์‹œ ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค.
prmt = "people under the rain"
prmt_embedding = (
    model.get_text_features(**tokenizer([prmt], return_tensors="pt", truncation=True).to("cuda"))[0]
    .detach()
    .cpu()
    .numpy()
)

scores, retrieved_examples = ds.get_nearest_examples("embeddings", prmt_embedding, k=1)
>>> display(retrieved_examples["image"][0])
< > Update on GitHub