lcw99 commited on
Commit
50a7a23
β€’
1 Parent(s): 7565d5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -8
README.md CHANGED
@@ -6,18 +6,46 @@ model-index:
6
  results: []
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
11
-
12
  # t5-large-korean-text-summary
13
 
14
- This model was trained from scratch on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
 
 
17
 
18
- ## Model description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- More information needed
21
 
22
  ## Intended uses & limitations
23
 
@@ -33,7 +61,7 @@ More information needed
33
 
34
  The following hyperparameters were used during training:
35
  - optimizer: None
36
- - training_precision: float32
37
 
38
  ### Training results
39
 
 
6
  results: []
7
  ---
8
 
 
 
 
9
  # t5-large-korean-text-summary
10
 
11
+ This model is a fine-tuning of [paust/pko-t5-large](https://huggingface.co/paust/pko-t5-large) model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.
 
12
 
13
+ 이 λͺ¨λΈμ€ paust/pko-t5-large model을 AIHUB "μš”μ•½λ¬Έ 및 레포트 생성 데이터"λ₯Ό μ΄μš©ν•˜μ—¬ fine tunning ν•œ κ²ƒμž…λ‹ˆλ‹€. 이 λͺ¨λΈμ€ ν•œκΈ€λ‘œλœ μž₯문을 짧게 μš”μ•½ν•΄ μ€λ‹ˆλ‹€.
14
 
15
+ ## Usage
16
+ ```python
17
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
18
+ import nltk
19
+ nltk.download('punkt')
20
+
21
+ model_dir = "lcw99/t5-large-korean-text-summary"
22
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
23
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
24
+
25
+ max_input_length = 512
26
+
27
+ text = """
28
+ 주인곡 강인ꡬ(ν•˜μ •μš°)λŠ” β€˜μˆ˜λ¦¬λ‚¨μ—μ„œ 홍어가 많이 λ‚˜λŠ”λ° λ‹€ κ°–λ‹€λ²„λ¦°λ‹€β€™λŠ” 친ꡬ
29
+ λ°•μ‘μˆ˜(ν˜„λ΄‰μ‹)의 μ–˜κΈ°λ₯Ό λ“£κ³  μˆ˜λ¦¬λ‚¨μ‚° 홍어λ₯Ό ν•œκ΅­μ— μˆ˜μΆœν•˜κΈ° μœ„ν•΄ μˆ˜λ¦¬λ‚¨μœΌλ‘œ κ°„λ‹€.
30
+ κ΅­λ¦½μˆ˜μ‚°κ³Όν•™μ› 츑은 β€œμ‹€μ œλ‘œ λ‚¨λŒ€μ„œμ–‘μ— 홍어가 많이 μ‚΄κ³  μ•„λ₯΄ν—¨ν‹°λ‚˜λ₯Ό λΉ„λ‘―ν•œ 남미 κ΅­κ°€μ—μ„œ 홍어가 많이 μž‘νžŒλ‹€β€λ©°
31
+ β€œμˆ˜λ¦¬λ‚¨ μ—°μ•ˆμ—λ„ 홍어가 많이 μ„œμ‹ν•  것”이라고 μ„€λͺ…ν–ˆλ‹€.
32
+
33
+ κ·ΈλŸ¬λ‚˜ 관세청에 λ”°λ₯΄λ©΄ ν•œκ΅­μ— μˆ˜λ¦¬λ‚¨μ‚° 홍어가 μˆ˜μž…λœ 적은 μ—†λ‹€.
34
+ 일각에선 β€œλˆμ„ 벌기 μœ„ν•΄ μˆ˜λ¦¬λ‚¨μ‚° 홍어λ₯Ό κ΅¬ν•˜λŸ¬ κ°„ 섀정은 κ°œμ—°μ„±μ΄ λ–¨μ–΄μ§„λ‹€β€λŠ” 지적도 ν•œλ‹€.
35
+ λ“œλΌλ§ˆ 배경이 된 2008~2010λ…„μ—λŠ” 이미 ꡭ내에 μ•„λ₯΄ν—¨ν‹°λ‚˜, 칠레, λ―Έκ΅­ λ“± 아메리카산 홍어가 μˆ˜μž…λ˜κ³  μžˆμ—ˆκΈ° λ•Œλ¬Έμ΄λ‹€.
36
+ μ‹€μ œ 쑰봉행 체포 μž‘μ „μ— ν˜‘μ‘°ν–ˆλ˜ β€˜ν˜‘λ ₯자 K씨’도 홍어 사업이 μ•„λ‹ˆλΌ μˆ˜λ¦¬λ‚¨μ— μ„ λ°•μš© νŠΉμˆ˜μš©μ ‘λ΄‰μ„ νŒŒλŠ” 사업을 ν•˜λŸ¬ μˆ˜λ¦¬λ‚¨μ— κ°”μ—ˆλ‹€.
37
+ """
38
+
39
+ inputs = ["summarize: " + text]
40
+
41
+ inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
42
+ output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
43
+ decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
44
+ predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
45
+
46
+ print(predicted_title)
47
+ ```
48
 
 
49
 
50
  ## Intended uses & limitations
51
 
 
61
 
62
  The following hyperparameters were used during training:
63
  - optimizer: None
64
+ - training_precision: float16
65
 
66
  ### Training results
67