ahassoun's picture
Upload 3018 files
ee6e328
|
raw
history blame
No virus
28.2 kB

๊ฐ์ฒด ํƒ์ง€ [[object-detection]]

[[open-in-colab]]

๊ฐ์ฒด ํƒ์ง€๋Š” ์ด๋ฏธ์ง€์—์„œ ์ธ์Šคํ„ด์Šค(์˜ˆ: ์‚ฌ๋žŒ, ๊ฑด๋ฌผ ๋˜๋Š” ์ž๋™์ฐจ)๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์ž…๋‹ˆ๋‹ค. ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ  ํƒ์ง€๋œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์ขŒํ‘œ์™€ ๊ด€๋ จ๋œ ๋ ˆ์ด๋ธ”์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ฐ๊ฐ์€ ์ž์ฒด์ ์ธ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ๋ ˆ์ด๋ธ”์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์ฐจ์™€ ๊ฑด๋ฌผ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€). ๋˜ํ•œ ๊ฐ ๊ฐ์ฒด๋Š” ์ด๋ฏธ์ง€์˜ ๋‹ค๋ฅธ ๋ถ€๋ถ„์— ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์ด๋ฏธ์ง€์— ์—ฌ๋Ÿฌ ๋Œ€์˜ ์ฐจ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Œ). ์ด ์ž‘์—…์€ ๋ณดํ–‰์ž, ๋„๋กœ ํ‘œ์ง€ํŒ, ์‹ ํ˜ธ๋“ฑ๊ณผ ๊ฐ™์€ ๊ฒƒ๋“ค์„ ๊ฐ์ง€ํ•˜๋Š” ์ž์œจ ์ฃผํ–‰์— ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์‘์šฉ ๋ถ„์•ผ๋กœ๋Š” ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด ์ˆ˜ ๊ณ„์‚ฐ ๋ฐ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ ๋‹ค์Œ์„ ๋ฐฐ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค:

  1. ํ•ฉ์„ฑ๊ณฑ ๋ฐฑ๋ณธ(์ธํ’‹ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ์ถ”์ถœํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ๋„คํŠธ์›Œํฌ)๊ณผ ์ธ์ฝ”๋”-๋””์ฝ”๋” ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ DETR ๋ชจ๋ธ์„ CPPE-5 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋ฏธ์„ธ์กฐ์ • ํ•˜๊ธฐ
  2. ๋ฏธ์„ธ์กฐ์ • ํ•œ ๋ชจ๋ธ์„ ์ถ”๋ก ์— ์‚ฌ์šฉํ•˜๊ธฐ.
์ด ํŠœํ† ๋ฆฌ์–ผ์˜ ํƒœ์Šคํฌ๋Š” ๋‹ค์Œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค:

Conditional DETR, Deformable DETR, DETA, DETR, Table Transformer, YOLOS

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ชจ๋“  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

pip install -q datasets transformers evaluate timm albumentations

ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์—์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ ๐Ÿค— Datasets๊ณผ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๐Ÿค— Transformers, ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ•˜๊ธฐ ์œ„ํ•œ albumentations๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. DETR ๋ชจ๋ธ์˜ ํ•ฉ์„ฑ๊ณฑ ๋ฐฑ๋ณธ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ˜„์žฌ timm์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๊ณ  ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก Hugging Face ๊ณ„์ •์— ๋กœ๊ทธ์ธํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ๊ฐ€ ๋‚˜ํƒ€๋‚˜๋ฉด ํ† ํฐ์„ ์ž…๋ ฅํ•˜์—ฌ ๋กœ๊ทธ์ธํ•˜์„ธ์š”:

>>> from huggingface_hub import notebook_login

>>> notebook_login()

CPPE-5 ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐ€์ ธ์˜ค๊ธฐ [[load-the-CPPE-5-dataset]]

CPPE-5 ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” COVID-19 ๋Œ€์œ ํ–‰ ์ƒํ™ฉ์—์„œ ์˜๋ฃŒ ์ „๋ฌธ์ธ๋ ฅ ๋ณดํ˜ธ ์žฅ๋น„(PPE)๋ฅผ ์‹๋ณ„ํ•˜๋Š” ์–ด๋…ธํ…Œ์ด์…˜์ด ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ฐ€์ ธ์˜ค์„ธ์š”:

>>> from datasets import load_dataset

>>> cppe5 = load_dataset("cppe-5")
>>> cppe5
DatasetDict({
    train: Dataset({
        features: ['image_id', 'image', 'width', 'height', 'objects'],
        num_rows: 1000
    })
    test: Dataset({
        features: ['image_id', 'image', 'width', 'height', 'objects'],
        num_rows: 29
    })
})

์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ํ•™์Šต ์„ธํŠธ ์ด๋ฏธ์ง€ 1,000๊ฐœ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ด๋ฏธ์ง€ 29๊ฐœ๋ฅผ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ์— ์ต์ˆ™ํ•ด์ง€๊ธฐ ์œ„ํ•ด, ์˜ˆ์‹œ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด์„ธ์š”.

>>> cppe5["train"][0]
{'image_id': 15,
 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=943x663 at 0x7F9EC9E77C10>,
 'width': 943,
 'height': 663,
 'objects': {'id': [114, 115, 116, 117],
  'area': [3796, 1596, 152768, 81002],
  'bbox': [[302.0, 109.0, 73.0, 52.0],
   [810.0, 100.0, 57.0, 28.0],
   [160.0, 31.0, 248.0, 616.0],
   [741.0, 68.0, 202.0, 401.0]],
  'category': [4, 4, 0, 0]}}

๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์žˆ๋Š” ์˜ˆ์‹œ๋Š” ๋‹ค์Œ์˜ ์˜์—ญ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:

  • image_id: ์˜ˆ์‹œ ์ด๋ฏธ์ง€ id
  • image: ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•˜๋Š” PIL.Image.Image ๊ฐ์ฒด
  • width: ์ด๋ฏธ์ง€์˜ ๋„ˆ๋น„
  • height: ์ด๋ฏธ์ง€์˜ ๋†’์ด
  • objects: ์ด๋ฏธ์ง€ ์•ˆ์˜ ๊ฐ์ฒด๋“ค์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ:
    • id: ์–ด๋…ธํ…Œ์ด์…˜ id
    • area: ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ๋ฉด์ 
    • bbox: ๊ฐ์ฒด์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค (COCO ํฌ๋งท์œผ๋กœ)
    • category: ๊ฐ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ, ๊ฐ€๋Šฅํ•œ ๊ฐ’์œผ๋กœ๋Š” Coverall (0), Face_Shield (1), Gloves (2), Goggles (3) ๋ฐ Mask (4) ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

bbox ํ•„๋“œ๊ฐ€ DETR ๋ชจ๋ธ์ด ์š”๊ตฌํ•˜๋Š” COCO ํ˜•์‹์„ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ objects ๋‚ด๋ถ€์˜ ํ•„๋“œ ๊ทธ๋ฃน์€ DETR์ด ์š”๊ตฌํ•˜๋Š” ์–ด๋…ธํ…Œ์ด์…˜ ํ˜•์‹๊ณผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ํ•œ ๊ฐ€์ง€ ์˜ˆ์‹œ๋ฅผ ์‹œ๊ฐํ™”ํ•˜์„ธ์š”.

>>> import numpy as np
>>> import os
>>> from PIL import Image, ImageDraw

>>> image = cppe5["train"][0]["image"]
>>> annotations = cppe5["train"][0]["objects"]
>>> draw = ImageDraw.Draw(image)

>>> categories = cppe5["train"].features["objects"].feature["category"].names

>>> id2label = {index: x for index, x in enumerate(categories, start=0)}
>>> label2id = {v: k for k, v in id2label.items()}

>>> for i in range(len(annotations["id"])):
...     box = annotations["bbox"][i - 1]
...     class_idx = annotations["category"][i - 1]
...     x, y, w, h = tuple(box)
...     draw.rectangle((x, y, x + w, y + h), outline="red", width=1)
...     draw.text((x, y), id2label[class_idx], fill="white")

>>> image
CPPE-5 Image Example

๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ์—ฐ๊ฒฐ๋œ ๋ ˆ์ด๋ธ”์„ ์‹œ๊ฐํ™”ํ•˜๋ ค๋ฉด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ, ํŠนํžˆ category ํ•„๋“œ์—์„œ ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ ˆ์ด๋ธ” ID๋ฅผ ๋ ˆ์ด๋ธ” ํด๋ž˜์Šค์— ๋งคํ•‘ํ•˜๋Š” id2label๊ณผ ๋ฐ˜๋Œ€๋กœ ๋งคํ•‘ํ•˜๋Š” label2id ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์„ค์ •ํ•  ๋•Œ ์ด๋Ÿฌํ•œ ๋งคํ•‘์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งคํ•‘์€ ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์—์„œ ๋ชจ๋ธ์„ ๊ณต์œ ํ–ˆ์„ ๋•Œ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์ข… ๋‹จ๊ณ„๋กœ, ์ž ์žฌ์ ์ธ ๋ฌธ์ œ๋ฅผ ์ฐพ์•„๋ณด์„ธ์š”. ๊ฐ์ฒด ๊ฐ์ง€๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๊ฐ€ ์ด๋ฏธ์ง€์˜ ๊ฐ€์žฅ์ž๋ฆฌ๋ฅผ ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ "๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ(run away)"์€ ํ›ˆ๋ จ ์ค‘์— ์˜ค๋ฅ˜๋ฅผ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ธฐ์— ์ด ๋‹จ๊ณ„์—์„œ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ์—์„œ ์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

>>> remove_idx = [590, 821, 822, 875, 876, 878, 879]
>>> keep = [i for i in range(len(cppe5["train"])) if i not in remove_idx]
>>> cppe5["train"] = cppe5["train"].select(keep)

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ [[preprocess-the-data]]

๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ํ•˜๋ ค๋ฉด, ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐฉ์‹๊ณผ ์ •ํ™•ํ•˜๊ฒŒ ์ผ์น˜ํ•˜๋„๋ก ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. [AutoImageProcessor]๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ DETR ๋ชจ๋ธ์ด ํ•™์Šต์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” pixel_values, pixel_mask, ๊ทธ๋ฆฌ๊ณ  labels๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ž‘์—…์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์ด ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์—๋Š” ๊ฑฑ์ •ํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์†์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • image_mean = [0.485, 0.456, 0.406 ]
  • image_std = [0.229, 0.224, 0.225]

์ด ๊ฐ’๋“ค์€ ๋ชจ๋ธ ์‚ฌ์ „ ํ›ˆ๋ จ ์ค‘ ์ด๋ฏธ์ง€๋ฅผ ์ •๊ทœํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ‰๊ท ๊ณผ ํ‘œ์ค€ ํŽธ์ฐจ์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ’๋“ค์€ ์ถ”๋ก  ๋˜๋Š” ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์ด๋ฏธ์ง€ ๋ชจ๋ธ์„ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •ํ•  ๋•Œ ๋ณต์ œํ•ด์•ผ ํ•˜๋Š” ์ค‘์š”ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.

์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ๊ณผ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค.

>>> from transformers import AutoImageProcessor

>>> checkpoint = "facebook/detr-resnet-50"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)

image_processor์— ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•˜๊ธฐ ์ „์—, ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋‘ ๊ฐ€์ง€ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  • ์ด๋ฏธ์ง€ ์ฆ๊ฐ•
  • DETR ๋ชจ๋ธ์˜ ์š”๊ตฌ์— ๋งž๊ฒŒ ์–ด๋…ธํ…Œ์ด์…˜์„ ๋‹ค์‹œ ํฌ๋งทํŒ…

์ฒซ์งธ๋กœ, ๋ชจ๋ธ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ ๋˜์ง€ ์•Š๋„๋ก ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค‘ ์•„๋ฌด๊ฑฐ๋‚˜ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ€ํ™˜์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” Albumentations ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค... ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ๋ณ€ํ™˜์„ ์ด๋ฏธ์ง€์— ์ ์šฉํ•˜๊ณ  ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ์—…๋ฐ์ดํŠธํ•˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฌธ์„œ์—๋Š” ๊ฐ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•ด ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ฐ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๊ฐ€์ด๋“œ๊ฐ€ ์žˆ์œผ๋ฉฐ, ์ด ์˜ˆ์ œ์™€ ์ •ํ™•ํžˆ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ๊ฐ ์ด๋ฏธ์ง€๋ฅผ (480, 480) ํฌ๊ธฐ๋กœ ์กฐ์ •ํ•˜๊ณ , ์ขŒ์šฐ๋กœ ๋’ค์ง‘๊ณ , ๋ฐ๊ธฐ๋ฅผ ๋†’์ด๋Š” ๋™์ผํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:

>>> import albumentations
>>> import numpy as np
>>> import torch

>>> transform = albumentations.Compose(
...     [
...         albumentations.Resize(480, 480),
...         albumentations.HorizontalFlip(p=1.0),
...         albumentations.RandomBrightnessContrast(p=1.0),
...     ],
...     bbox_params=albumentations.BboxParams(format="coco", label_fields=["category"]),
... )

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ์–ด๋…ธํ…Œ์ด์…˜์ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•์‹์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค: {'image_id': int, 'annotations': List[Dict]}, ์—ฌ๊ธฐ์„œ ๊ฐ ๋”•์…”๋„ˆ๋ฆฌ๋Š” COCO ๊ฐ์ฒด ์–ด๋…ธํ…Œ์ด์…˜์ž…๋‹ˆ๋‹ค. ๋‹จ์ผ ์˜ˆ์ œ์— ๋Œ€ํ•ด ์–ด๋…ธํ…Œ์ด์…˜์˜ ํ˜•์‹์„ ๋‹ค์‹œ ์ง€์ •ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

>>> def formatted_anns(image_id, category, area, bbox):
...     annotations = []
...     for i in range(0, len(category)):
...         new_ann = {
...             "image_id": image_id,
...             "category_id": category[i],
...             "isCrowd": 0,
...             "area": area[i],
...             "bbox": list(bbox[i]),
...         }
...         annotations.append(new_ann)

...     return annotations

์ด์ œ ์ด๋ฏธ์ง€์™€ ์–ด๋…ธํ…Œ์ด์…˜ ์ „์ฒ˜๋ฆฌ ๋ณ€ํ™˜์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์˜ˆ์ œ ๋ฐฐ์น˜์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> # transforming a batch
>>> def transform_aug_ann(examples):
...     image_ids = examples["image_id"]
...     images, bboxes, area, categories = [], [], [], []
...     for image, objects in zip(examples["image"], examples["objects"]):
...         image = np.array(image.convert("RGB"))[:, :, ::-1]
...         out = transform(image=image, bboxes=objects["bbox"], category=objects["category"])

...         area.append(objects["area"])
...         images.append(out["image"])
...         bboxes.append(out["bboxes"])
...         categories.append(out["category"])

...     targets = [
...         {"image_id": id_, "annotations": formatted_anns(id_, cat_, ar_, box_)}
...         for id_, cat_, ar_, box_ in zip(image_ids, categories, area, bboxes)
...     ]

...     return image_processor(images=images, annotations=targets, return_tensors="pt")

์ด์ „ ๋‹จ๊ณ„์—์„œ ๋งŒ๋“  ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋ฅผ ๐Ÿค— Datasets์˜ [~datasets.Dataset.with_transform] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ „์ฒด์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฉ”์†Œ๋“œ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์š”์†Œ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ๋งˆ๋‹ค ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ด ์‹œ์ ์—์„œ๋Š” ์ „์ฒ˜๋ฆฌ ํ›„ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์˜ˆ์‹œ ํ•˜๋‚˜๋ฅผ ๊ฐ€์ ธ์™€์„œ ๋ณ€ํ™˜ ํ›„ ๋ชจ์–‘์ด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ, pixel_values ํ…์„œ, pixel_mask ํ…์„œ, ๊ทธ๋ฆฌ๊ณ  labels๋กœ ๊ตฌ์„ฑ๋œ ํ…์„œ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

>>> cppe5["train"] = cppe5["train"].with_transform(transform_aug_ann)
>>> cppe5["train"][15]
{'pixel_values': tensor([[[ 0.9132,  0.9132,  0.9132,  ..., -1.9809, -1.9809, -1.9809],
          [ 0.9132,  0.9132,  0.9132,  ..., -1.9809, -1.9809, -1.9809],
          [ 0.9132,  0.9132,  0.9132,  ..., -1.9638, -1.9638, -1.9638],
          ...,
          [-1.5699, -1.5699, -1.5699,  ..., -1.9980, -1.9980, -1.9980],
          [-1.5528, -1.5528, -1.5528,  ..., -1.9980, -1.9809, -1.9809],
          [-1.5528, -1.5528, -1.5528,  ..., -1.9980, -1.9809, -1.9809]],

         [[ 1.3081,  1.3081,  1.3081,  ..., -1.8431, -1.8431, -1.8431],
          [ 1.3081,  1.3081,  1.3081,  ..., -1.8431, -1.8431, -1.8431],
          [ 1.3081,  1.3081,  1.3081,  ..., -1.8256, -1.8256, -1.8256],
          ...,
          [-1.3179, -1.3179, -1.3179,  ..., -1.8606, -1.8606, -1.8606],
          [-1.3004, -1.3004, -1.3004,  ..., -1.8606, -1.8431, -1.8431],
          [-1.3004, -1.3004, -1.3004,  ..., -1.8606, -1.8431, -1.8431]],

         [[ 1.4200,  1.4200,  1.4200,  ..., -1.6476, -1.6476, -1.6476],
          [ 1.4200,  1.4200,  1.4200,  ..., -1.6476, -1.6476, -1.6476],
          [ 1.4200,  1.4200,  1.4200,  ..., -1.6302, -1.6302, -1.6302],
          ...,
          [-1.0201, -1.0201, -1.0201,  ..., -1.5604, -1.5604, -1.5604],
          [-1.0027, -1.0027, -1.0027,  ..., -1.5604, -1.5430, -1.5430],
          [-1.0027, -1.0027, -1.0027,  ..., -1.5604, -1.5430, -1.5430]]]),
 'pixel_mask': tensor([[1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1],
         ...,
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 1, 1, 1]]),
 'labels': {'size': tensor([800, 800]), 'image_id': tensor([756]), 'class_labels': tensor([4]), 'boxes': tensor([[0.7340, 0.6986, 0.3414, 0.5944]]), 'area': tensor([519544.4375]), 'iscrowd': tensor([0]), 'orig_size': tensor([480, 480])}}

๊ฐ๊ฐ์˜ ์ด๋ฏธ์ง€๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ฆ๊ฐ•ํ•˜๊ณ  ์ด๋ฏธ์ง€์˜ ์–ด๋…ธํ…Œ์ด์…˜์„ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ „์ฒ˜๋ฆฌ๋Š” ์•„์ง ๋๋‚˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋กœ, ์ด๋ฏธ์ง€๋ฅผ ๋ฐฐ์น˜๋กœ ๋งŒ๋“ค ์‚ฌ์šฉ์ž ์ •์˜ collate_fn์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฐฐ์น˜์—์„œ ๊ฐ€์žฅ ํฐ ์ด๋ฏธ์ง€์— ์ด๋ฏธ์ง€(ํ˜„์žฌ pixel_values ์ธ)๋ฅผ ํŒจ๋“œํ•˜๊ณ , ์‹ค์ œ ํ”ฝ์…€(1)๊ณผ ํŒจ๋”ฉ(0)์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ์ƒˆ๋กœ์šด pixel_mask๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

>>> def collate_fn(batch):
...     pixel_values = [item["pixel_values"] for item in batch]
...     encoding = image_processor.pad(pixel_values, return_tensors="pt")
...     labels = [item["labels"] for item in batch]
...     batch = {}
...     batch["pixel_values"] = encoding["pixel_values"]
...     batch["pixel_mask"] = encoding["pixel_mask"]
...     batch["labels"] = labels
...     return batch

DETR ๋ชจ๋ธ ํ•™์Šต์‹œํ‚ค๊ธฐ [[training-the-DETR-model]]

์ด์ „ ์„น์…˜์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ด์ œ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์ด๋ฏธ์ง€๋Š” ๋ฆฌ์‚ฌ์ด์ฆˆ ํ›„์—๋„ ์—ฌ์ „ํžˆ ์šฉ๋Ÿ‰์ด ํฌ๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ํ•˜๋ ค๋ฉด ์ ์–ด๋„ ํ•˜๋‚˜์˜ GPU๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต์€ ๋‹ค์Œ์˜ ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค:

  1. [AutoModelForObjectDetection]์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒ˜๋ฆฌ์™€ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
  2. [TrainingArguments]์—์„œ ํ•™์Šต ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ์„ธํŠธ, ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ ๋ฐ ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ์™€ ํ•จ๊ป˜ [Trainer]์— ํ›ˆ๋ จ ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  4. [~Trainer.train]๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒ˜๋ฆฌ์— ์‚ฌ์šฉํ•œ ์ฒดํฌํฌ์ธํŠธ์™€ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ฌ ๋•Œ, ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์—์„œ ๋งŒ๋“  label2id์™€ id2label ๋งคํ•‘์„ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ignore_mismatched_sizes=True๋ฅผ ์ง€์ •ํ•˜์—ฌ ๊ธฐ์กด ๋ถ„๋ฅ˜ ํ—ค๋“œ(๋ชจ๋ธ์—์„œ ๋ถ„๋ฅ˜์— ์‚ฌ์šฉ๋˜๋Š” ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด)๋ฅผ ์ƒˆ ๋ถ„๋ฅ˜ ํ—ค๋“œ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

>>> from transformers import AutoModelForObjectDetection

>>> model = AutoModelForObjectDetection.from_pretrained(
...     checkpoint,
...     id2label=id2label,
...     label2id=label2id,
...     ignore_mismatched_sizes=True,
... )

[TrainingArguments]์—์„œ output_dir์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ €์žฅํ•  ์œ„์น˜๋ฅผ ์ง€์ •ํ•œ ๋‹ค์Œ, ํ•„์š”์— ๋”ฐ๋ผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌ์„ฑํ•˜์„ธ์š”. ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์—ด์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ remove_unused_columns๊ฐ€ True์ผ ๊ฒฝ์šฐ ์ด๋ฏธ์ง€ ์—ด์ด ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ์—ด์ด ์—†๋Š” ๊ฒฝ์šฐ pixel_values๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— remove_unused_columns๋ฅผ False๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ Hub์— ์—…๋กœ๋“œํ•˜์—ฌ ๊ณต์œ ํ•˜๋ ค๋ฉด push_to_hub๋ฅผ True๋กœ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค(ํ—ˆ๊น…ํŽ˜์ด์Šค์— ๋กœ๊ทธ์ธํ•˜์—ฌ ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค).

>>> from transformers import TrainingArguments

>>> training_args = TrainingArguments(
...     output_dir="detr-resnet-50_finetuned_cppe5",
...     per_device_train_batch_size=8,
...     num_train_epochs=10,
...     fp16=True,
...     save_steps=200,
...     logging_steps=50,
...     learning_rate=1e-5,
...     weight_decay=1e-4,
...     save_total_limit=2,
...     remove_unused_columns=False,
...     push_to_hub=True,
... )

๋งˆ์ง€๋ง‰์œผ๋กœ model, training_args, collate_fn, image_processor์™€ ๋ฐ์ดํ„ฐ ์„ธํŠธ(cppe5)๋ฅผ ๋ชจ๋‘ ๊ฐ€์ ธ์˜จ ํ›„, [~transformers.Trainer.train]๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค.

>>> from transformers import Trainer

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     data_collator=collate_fn,
...     train_dataset=cppe5["train"],
...     tokenizer=image_processor,
... )

>>> trainer.train()

training_args์—์„œ push_to_hub๋ฅผ True๋กœ ์„ค์ •ํ•œ ๊ฒฝ์šฐ, ํ•™์Šต ์ฒดํฌํฌ์ธํŠธ๋Š” ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์™„๋ฃŒ ํ›„, [~transformers.Trainer.push_to_hub] ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ตœ์ข… ๋ชจ๋ธ์„ ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

>>> trainer.push_to_hub()

ํ‰๊ฐ€ํ•˜๊ธฐ [[evaluate]]

๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ผ๋ จ์˜ COCO-์Šคํƒ€์ผ ์ง€ํ‘œ๋กœ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์— ๊ตฌํ˜„๋œ ํ‰๊ฐ€ ์ง€ํ‘œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์—ฌ๊ธฐ์—์„œ๋Š” ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•œ ์ตœ์ข… ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ torchvision์—์„œ ์ œ๊ณตํ•˜๋Š” ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

torchvision ํ‰๊ฐ€์ž(evaluator)๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์‹ค์ธก๊ฐ’์ธ COCO ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ค€๋น„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. COCO ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋นŒ๋“œํ•˜๋Š” API๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ • ํ˜•์‹์œผ๋กœ ์ €์žฅํ•ด์•ผ ํ•˜๋ฏ€๋กœ, ๋จผ์ € ์ด๋ฏธ์ง€์™€ ์–ด๋…ธํ…Œ์ด์…˜์„ ๋””์Šคํฌ์— ์ €์žฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•  ๋•Œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, cppe5["test"]์—์„œ์˜ ์–ด๋…ธํ…Œ์ด์…˜์€ ํฌ๋งท์„ ๋งž์ถฐ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋ฏธ์ง€๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ‰๊ฐ€ ๋‹จ๊ณ„๋Š” ์•ฝ๊ฐ„์˜ ์ž‘์—…์ด ํ•„์š”ํ•˜์ง€๋งŒ, ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ €, cppe5["test"] ์„ธํŠธ๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค: ์–ด๋…ธํ…Œ์ด์…˜์„ ํฌ๋งท์— ๋งž๊ฒŒ ๋งŒ๋“ค๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๋””์Šคํฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

>>> import json


>>> # format annotations the same as for training, no need for data augmentation
>>> def val_formatted_anns(image_id, objects):
...     annotations = []
...     for i in range(0, len(objects["id"])):
...         new_ann = {
...             "id": objects["id"][i],
...             "category_id": objects["category"][i],
...             "iscrowd": 0,
...             "image_id": image_id,
...             "area": objects["area"][i],
...             "bbox": objects["bbox"][i],
...         }
...         annotations.append(new_ann)

...     return annotations


>>> # Save images and annotations into the files torchvision.datasets.CocoDetection expects
>>> def save_cppe5_annotation_file_images(cppe5):
...     output_json = {}
...     path_output_cppe5 = f"{os.getcwd()}/cppe5/"

...     if not os.path.exists(path_output_cppe5):
...         os.makedirs(path_output_cppe5)

...     path_anno = os.path.join(path_output_cppe5, "cppe5_ann.json")
...     categories_json = [{"supercategory": "none", "id": id, "name": id2label[id]} for id in id2label]
...     output_json["images"] = []
...     output_json["annotations"] = []
...     for example in cppe5:
...         ann = val_formatted_anns(example["image_id"], example["objects"])
...         output_json["images"].append(
...             {
...                 "id": example["image_id"],
...                 "width": example["image"].width,
...                 "height": example["image"].height,
...                 "file_name": f"{example['image_id']}.png",
...             }
...         )
...         output_json["annotations"].extend(ann)
...     output_json["categories"] = categories_json

...     with open(path_anno, "w") as file:
...         json.dump(output_json, file, ensure_ascii=False, indent=4)

...     for im, img_id in zip(cppe5["image"], cppe5["image_id"]):
...         path_img = os.path.join(path_output_cppe5, f"{img_id}.png")
...         im.save(path_img)

...     return path_output_cppe5, path_anno

๋‹ค์Œ์œผ๋กœ, cocoevaluator์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” CocoDetection ํด๋ž˜์Šค์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค.

>>> import torchvision


>>> class CocoDetection(torchvision.datasets.CocoDetection):
...     def __init__(self, img_folder, image_processor, ann_file):
...         super().__init__(img_folder, ann_file)
...         self.image_processor = image_processor

...     def __getitem__(self, idx):
...         # read in PIL image and target in COCO format
...         img, target = super(CocoDetection, self).__getitem__(idx)

...         # preprocess image and target: converting target to DETR format,
...         # resizing + normalization of both image and target)
...         image_id = self.ids[idx]
...         target = {"image_id": image_id, "annotations": target}
...         encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
...         pixel_values = encoding["pixel_values"].squeeze()  # remove batch dimension
...         target = encoding["labels"][0]  # remove batch dimension

...         return {"pixel_values": pixel_values, "labels": target}


>>> im_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")

>>> path_output_cppe5, path_anno = save_cppe5_annotation_file_images(cppe5["test"])
>>> test_ds_coco_format = CocoDetection(path_output_cppe5, im_processor, path_anno)

๋งˆ์ง€๋ง‰์œผ๋กœ, ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ฐ€์ ธ์™€์„œ ํ‰๊ฐ€๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

>>> import evaluate
>>> from tqdm import tqdm

>>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> module = evaluate.load("ybelkada/cocoevaluate", coco=test_ds_coco_format.coco)
>>> val_dataloader = torch.utils.data.DataLoader(
...     test_ds_coco_format, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn
... )

>>> with torch.no_grad():
...     for idx, batch in enumerate(tqdm(val_dataloader)):
...         pixel_values = batch["pixel_values"]
...         pixel_mask = batch["pixel_mask"]

...         labels = [
...             {k: v for k, v in t.items()} for t in batch["labels"]
...         ]  # these are in DETR format, resized + normalized

...         # forward pass
...         outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask)

...         orig_target_sizes = torch.stack([target["orig_size"] for target in labels], dim=0)
...         results = im_processor.post_process(outputs, orig_target_sizes)  # convert outputs of model to COCO api

...         module.add(prediction=results, reference=labels)
...         del batch

>>> results = module.compute()
>>> print(results)
Accumulating evaluation results...
DONE (t=0.08s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.352
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.681
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.292
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.208
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.274
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.501
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590

์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” [~transformers.TrainingArguments]์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋”์šฑ ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ์‹œ๋„ํ•ด ๋ณด์„ธ์š”!

์ถ”๋ก ํ•˜๊ธฐ [[inference]]

DETR ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ๋ฐ ํ‰๊ฐ€ํ•˜๊ณ , ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œ ํ–ˆ์œผ๋ฏ€๋กœ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ์„ ์ถ”๋ก ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ [pipeline]์—์„œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ธ์Šคํ„ด์Šคํ™”ํ•˜๊ณ , ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•˜์„ธ์š”:

>>> from transformers import pipeline
>>> import requests

>>> url = "https://i.imgur.com/2lnWoly.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> obj_detector = pipeline("object-detection", model="devonho/detr-resnet-50_finetuned_cppe5")
>>> obj_detector(image)

๋งŒ์•ฝ ์›ํ•œ๋‹ค๋ฉด ์ˆ˜๋™์œผ๋กœ pipeline์˜ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> image_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")

>>> with torch.no_grad():
...     inputs = image_processor(images=image, return_tensors="pt")
...     outputs = model(**inputs)
...     target_sizes = torch.tensor([image.size[::-1]])
...     results = image_processor.post_process_object_detection(outputs, threshold=0.5, target_sizes=target_sizes)[0]

>>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
...     box = [round(i, 2) for i in box.tolist()]
...     print(
...         f"Detected {model.config.id2label[label.item()]} with confidence "
...         f"{round(score.item(), 3)} at location {box}"
...     )
Detected Coverall with confidence 0.566 at location [1215.32, 147.38, 4401.81, 3227.08]
Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.9]

๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค:

>>> draw = ImageDraw.Draw(image)

>>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
...     box = [round(i, 2) for i in box.tolist()]
...     x, y, x2, y2 = tuple(box)
...     draw.rectangle((x, y, x2, y2), outline="red", width=1)
...     draw.text((x, y), model.config.id2label[label.item()], fill="white")

>>> image
Object detection result on a new image