File size: 2,743 Bytes
6134967
74068ab
 
6134967
 
 
07d937b
6134967
 
 
13007b9
6134967
cddd06b
6134967
ac6e7de
6134967
7c2916e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6134967
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5391c2d
6134967
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
language: 
  - ko
tags:
- generated_from_keras_callback
model-index:
- name: t5-base-korean-text-summary
  results: []
---

# t5-base-korean-text-summary

This model is a fine-tuning of [paust/pko-t5-base](https://huggingface.co/paust/pko-t5-base) model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.

이 λͺ¨λΈμ€ paust/pko-t5-base model을 AIHUB "μš”μ•½λ¬Έ 및 레포트 생성 데이터"λ₯Ό μ΄μš©ν•˜μ—¬ fine tunning ν•œ κ²ƒμž…λ‹ˆλ‹€. 이 λͺ¨λΈμ€ ν•œκΈ€λ‘œλœ μž₯문을 짧게 μš”μ•½ν•΄ μ€λ‹ˆλ‹€.

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')

model_dir = "lcw99/t5-base-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)

max_input_length = 512

text = """
주인곡 강인ꡬ(ν•˜μ •μš°)λŠ” β€˜μˆ˜λ¦¬λ‚¨μ—μ„œ 홍어가 많이 λ‚˜λŠ”λ° λ‹€ κ°–λ‹€λ²„λ¦°λ‹€β€™λŠ” 친ꡬ 
λ°•μ‘μˆ˜(ν˜„λ΄‰μ‹)의 μ–˜κΈ°λ₯Ό λ“£κ³  μˆ˜λ¦¬λ‚¨μ‚° 홍어λ₯Ό ν•œκ΅­μ— μˆ˜μΆœν•˜κΈ° μœ„ν•΄ μˆ˜λ¦¬λ‚¨μœΌλ‘œ κ°„λ‹€. 
κ΅­λ¦½μˆ˜μ‚°κ³Όν•™μ› 츑은 β€œμ‹€μ œλ‘œ λ‚¨λŒ€μ„œμ–‘μ— 홍어가 많이 μ‚΄κ³  μ•„λ₯΄ν—¨ν‹°λ‚˜λ₯Ό λΉ„λ‘―ν•œ 남미 κ΅­κ°€μ—μ„œ 홍어가 많이 μž‘νžŒλ‹€β€λ©° 
β€œμˆ˜λ¦¬λ‚¨ μ—°μ•ˆμ—λ„ 홍어가 많이 μ„œμ‹ν•  것”이라고 μ„€λͺ…ν–ˆλ‹€.

κ·ΈλŸ¬λ‚˜ 관세청에 λ”°λ₯΄λ©΄ ν•œκ΅­μ— μˆ˜λ¦¬λ‚¨μ‚° 홍어가 μˆ˜μž…λœ 적은 μ—†λ‹€. 
일각에선 β€œλˆμ„ 벌기 μœ„ν•΄ μˆ˜λ¦¬λ‚¨μ‚° 홍어λ₯Ό κ΅¬ν•˜λŸ¬ κ°„ 섀정은 κ°œμ—°μ„±μ΄ λ–¨μ–΄μ§„λ‹€β€λŠ” 지적도 ν•œλ‹€. 
λ“œλΌλ§ˆ 배경이 된 2008~2010λ…„μ—λŠ” 이미 ꡭ내에 μ•„λ₯΄ν—¨ν‹°λ‚˜, 칠레, λ―Έκ΅­ λ“± 아메리카산 홍어가 μˆ˜μž…λ˜κ³  μžˆμ—ˆκΈ° λ•Œλ¬Έμ΄λ‹€. 
μ‹€μ œ 쑰봉행 체포 μž‘μ „μ— ν˜‘μ‘°ν–ˆλ˜ β€˜ν˜‘λ ₯자 K씨’도 홍어 사업이 μ•„λ‹ˆλΌ μˆ˜λ¦¬λ‚¨μ— μ„ λ°•μš© νŠΉμˆ˜μš©μ ‘λ΄‰μ„ νŒŒλŠ” 사업을 ν•˜λŸ¬ μˆ˜λ¦¬λ‚¨μ— κ°”μ—ˆλ‹€.
"""

inputs = ["summarize: " + text]

inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]

print(predicted_title)
```


## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- optimizer: None
- training_precision: float16

### Training results



### Framework versions

- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.5.1
- Tokenizers 0.12.1