Yeon-Su Lee
Update README.md
f67e878 verified
metadata
language: ko
license: apache-2.0
tags:
  - summarization
  - legal
  - korean
datasets:
  - ai-hub
model_name: gemma-2b-it-sum-ko-legal
base_model:
  - google/gemma-2-2b-it

Gemma-2B-it-sum-ko-legal

๋ชจ๋ธ ์„ค๋ช…

Gemma-2B-it-sum-ko-legal์€ AI ํ—ˆ๋ธŒ์˜ ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๋ฒ•๋ฅ  ๋ฌธ์„œ, ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ์™€ ๊ฐ™์€ ํ•œ๊ตญ์–ด ๋ฌธ์„œ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์š”์•ฝํ•˜๋Š” ๋ฐ ํŠนํ™”๋˜์–ด ์žˆ์œผ๋ฉฐ, Hugging Face์˜ ์‚ฌ์ „ ํ•™์Šต๋œ Gemma 2B ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ธด ๋ฒ•๋ฅ  ๋ฌธ์„œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ํ•ต์‹ฌ ๋‚ด์šฉ์„ ์ž๋™์œผ๋กœ ์ถ”์ถœํ•˜์—ฌ ๋ฒ•๋ฅ  ์ „๋ฌธ๊ฐ€๋“ค์ด ๋” ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์œผ๋กœ ๋ฌธ์„œ๋ฅผ ๊ฒ€ํ† ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.

  • ์ง€์› ์–ธ์–ด: ํ•œ๊ตญ์–ด
  • ํŠน์ง•: ๋ฒ•๋ฅ  ๋ฌธ์„œ ์š”์•ฝ์— ์ตœ์ ํ™”

๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •

๋ฐ์ดํ„ฐ์…‹

์ด ๋ชจ๋ธ์€ AI ํ—ˆ๋ธŒ์˜ ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฐ์ดํ„ฐ์…‹์€ ๋ฒ•๋ฅ  ๋ฌธ์„œ์˜ ๊ตฌ์กฐ์™€ ๋‚ด์šฉ์„ ์ดํ•ดํ•˜๊ณ  ์š”์•ฝํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ๋กœ, ์—ฌ๋Ÿฌ ๋ฒ•๋ฅ  ์ฃผ์ œ๋ฅผ ํฌ๊ด„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐฉ๋ฒ•

๋ชจ๋ธ์€ Hugging Face์˜ Gemma 2B ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ์œผ๋ฉฐ, ๋ฒ•๋ฅ  ๋ฌธ์„œ์˜ ํŠน์ˆ˜์„ฑ์„ ๋ฐ˜์˜ํ•œ ์ถ”๊ฐ€ ํ•™์Šต์„ ํ†ตํ•ด ์ตœ์ ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ•™์Šต์—๋Š” FP16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ•™์Šต์ด ์‚ฌ์šฉ๋˜์—ˆ์œผ๋ฉฐ, ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ๋ฐฐ์น˜ ํฌ๊ธฐ: 16
  • ํ•™์Šต๋ฅ : 5e-5
  • ์ตœ์ ํ™” ๊ธฐ๋ฒ•: AdamW
  • ํ•™์Šต ์—ํญ: 3
  • ํ•˜๋“œ์›จ์–ด: NVIDIA A100 GPU

์ฝ”๋“œ ์˜ˆ์‹œ

์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ํ•œ๊ตญ์–ด ๋ฒ•๋ฅ  ๋ฌธ์„œ๋ฅผ ์š”์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers import pipeline

# ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
pipe_finetuned = pipeline("text-generation", model="your-username/gemma-2b-it-sum-ko-legal", tokenizer="your-username/gemma-2b-it-sum-ko-legal", max_new_tokens=512)

# ์š”์•ฝํ•  ํ…์ŠคํŠธ ์ž…๋ ฅ
paragraph = """
    ํ•œ๊ตญ์˜ ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ ๋‚ด์šฉ์€ ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๊ธด ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.
    ์ด๋Ÿฌํ•œ ๋ฌธ์„œ๋ฅผ ์š”์•ฝํ•˜์—ฌ ์ฃผ์š” ์ •๋ณด๋ฅผ ๋น ๋ฅด๊ฒŒ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
"""

# ์š”์•ฝ ์š”์ฒญ
summary = pipe_finetuned(paragraph, do_sample=True, temperature=0.2, top_k=50, top_p=0.95)
print(summary[0]["generated_text"])