File size: 2,435 Bytes
642d43c
f67e878
 
 
 
 
 
 
 
 
642d43c
 
 
 
f67e878
642d43c
f67e878
d56794e
f67e878
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
language: ko
license: apache-2.0
tags:
- summarization
- legal
- korean
datasets:
- ai-hub
model_name: gemma-2b-it-sum-ko-legal
base_model:
- google/gemma-2-2b-it
---

# Gemma-2B-it-sum-ko-legal

## ๋ชจ๋ธ ์„ค๋ช…

**Gemma-2B-it-sum-ko-legal**์€ AI ํ—ˆ๋ธŒ์˜ **๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹**์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๋ฒ•๋ฅ  ๋ฌธ์„œ, ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ์™€ ๊ฐ™์€ ํ•œ๊ตญ์–ด ๋ฌธ์„œ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์š”์•ฝํ•˜๋Š” ๋ฐ ํŠนํ™”๋˜์–ด ์žˆ์œผ๋ฉฐ, Hugging Face์˜ ์‚ฌ์ „ ํ•™์Šต๋œ **Gemma 2B** ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ธด ๋ฒ•๋ฅ  ๋ฌธ์„œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ํ•ต์‹ฌ ๋‚ด์šฉ์„ ์ž๋™์œผ๋กœ ์ถ”์ถœํ•˜์—ฌ ๋ฒ•๋ฅ  ์ „๋ฌธ๊ฐ€๋“ค์ด ๋” ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์œผ๋กœ ๋ฌธ์„œ๋ฅผ ๊ฒ€ํ† ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.

- **์ง€์› ์–ธ์–ด**: ํ•œ๊ตญ์–ด
- **ํŠน์ง•**: ๋ฒ•๋ฅ  ๋ฌธ์„œ ์š”์•ฝ์— ์ตœ์ ํ™”

## ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •

### ๋ฐ์ดํ„ฐ์…‹

์ด ๋ชจ๋ธ์€ **AI ํ—ˆ๋ธŒ์˜ ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹**์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฐ์ดํ„ฐ์…‹์€ ๋ฒ•๋ฅ  ๋ฌธ์„œ์˜ ๊ตฌ์กฐ์™€ ๋‚ด์šฉ์„ ์ดํ•ดํ•˜๊ณ  ์š”์•ฝํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ๋กœ, ์—ฌ๋Ÿฌ ๋ฒ•๋ฅ  ์ฃผ์ œ๋ฅผ ํฌ๊ด„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

### ํ•™์Šต ๋ฐฉ๋ฒ•

๋ชจ๋ธ์€ Hugging Face์˜ **Gemma 2B** ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ์œผ๋ฉฐ, ๋ฒ•๋ฅ  ๋ฌธ์„œ์˜ ํŠน์ˆ˜์„ฑ์„ ๋ฐ˜์˜ํ•œ ์ถ”๊ฐ€ ํ•™์Šต์„ ํ†ตํ•ด ์ตœ์ ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ•™์Šต์—๋Š” **FP16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ•™์Šต**์ด ์‚ฌ์šฉ๋˜์—ˆ์œผ๋ฉฐ, ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค:

- **๋ฐฐ์น˜ ํฌ๊ธฐ**: 16
- **ํ•™์Šต๋ฅ **: 5e-5
- **์ตœ์ ํ™” ๊ธฐ๋ฒ•**: AdamW
- **ํ•™์Šต ์—ํญ**: 3
- **ํ•˜๋“œ์›จ์–ด**: NVIDIA A100 GPU

## ์ฝ”๋“œ ์˜ˆ์‹œ

์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ํ•œ๊ตญ์–ด ๋ฒ•๋ฅ  ๋ฌธ์„œ๋ฅผ ์š”์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

```python
from transformers import pipeline

# ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
pipe_finetuned = pipeline("text-generation", model="your-username/gemma-2b-it-sum-ko-legal", tokenizer="your-username/gemma-2b-it-sum-ko-legal", max_new_tokens=512)

# ์š”์•ฝํ•  ํ…์ŠคํŠธ ์ž…๋ ฅ
paragraph = """
    ํ•œ๊ตญ์˜ ๋ฒ•๋ฅ ์•ˆ ๊ฒ€ํ†  ๋ณด๊ณ ์„œ ๋‚ด์šฉ์€ ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๊ธด ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.
    ์ด๋Ÿฌํ•œ ๋ฌธ์„œ๋ฅผ ์š”์•ฝํ•˜์—ฌ ์ฃผ์š” ์ •๋ณด๋ฅผ ๋น ๋ฅด๊ฒŒ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
"""

# ์š”์•ฝ ์š”์ฒญ
summary = pipe_finetuned(paragraph, do_sample=True, temperature=0.2, top_k=50, top_p=0.95)
print(summary[0]["generated_text"])