eenzeenee commited on
Commit
e5cbd9e
Β·
1 Parent(s): 4652695

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -26,6 +26,7 @@ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
26
  model = AutoModelForSeq2SeqLM.from_pretrained('eenzeenee/t5-base-korean-summarization')
27
  tokenizer = AutoTokenizer.from_pretrained('eenzeenee/t5-base-korean-summarization')
28
 
 
29
  sample = """
30
  μ•ˆλ…•ν•˜μ„Έμš”? 우리 (2ν•™λ…„)/(이 ν•™λ…„) μΉœκ΅¬λ“€ 우리 μΉœκ΅¬λ“€ 학ꡐ에 κ°€μ„œ μ§„μ§œ (2ν•™λ…„)/(이 ν•™λ…„) 이 되고 μ‹Άμ—ˆλŠ”λ° 학ꡐ에 λͺ» κ°€κ³  μžˆμ–΄μ„œ λ‹΅λ‹΅ν•˜μ£ ?
31
  κ·Έλž˜λ„ 우리 μΉœκ΅¬λ“€μ˜ μ•ˆμ „κ³Ό 건강이 μ΅œμš°μ„ μ΄λ‹ˆκΉŒμš” μ˜€λŠ˜λΆ€ν„° μ„ μƒλ‹˜μ΄λž‘ 맀일 맀일 κ΅­μ–΄ 여행을 λ– λ‚˜λ³΄λ„λ‘ ν•΄μš”.
@@ -43,10 +44,10 @@ sample = """
43
  μ–΄λ–»κ²Œ μ—¬λŸ¬κ°€μ§€ λ°©λ²•μœΌλ‘œ μ½μ„κΉŒ 우리 곡뢀해 보도둝 ν•΄μš”. 였늘의 μ‹œ λ‚˜μ™€λΌ μ§œμž”/! μ‹œκ°€ λ‚˜μ™”μŠ΅λ‹ˆλ‹€ μ‹œμ˜ 제λͺ©μ΄ λ­”κ°€μš”? λ‹€νˆ° λ‚ μ΄μ—μš” λ‹€νˆ° λ‚ .
44
  λˆ„κ΅¬λž‘ λ‹€ν‰œλ‚˜ λ™μƒμ΄λž‘ λ‹€ν‰œλ‚˜ μ–Έλ‹ˆλž‘ μΉœκ΅¬λž‘? λˆ„κ΅¬λž‘ λ‹€ν‰œλŠ”μ§€ μ„ μƒλ‹˜μ΄ μ‹œλ₯Ό 읽어 쀄 ν…Œλ‹ˆκΉŒ ν•œλ²ˆ 생각을 해보도둝 ν•΄μš”."""
45
 
46
- inputs = [args.prefix + sample]
47
 
48
 
49
- inputs = tokenizer(inputs, max_length=args.max_input_length, truncation=True, return_tensors="pt")
50
  output = model.generate(**inputs, num_beams=3, do_sample=True, min_length=10, max_length=64)
51
  decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
52
  result = nltk.sent_tokenize(decoded_output.strip())[0]
 
26
  model = AutoModelForSeq2SeqLM.from_pretrained('eenzeenee/t5-base-korean-summarization')
27
  tokenizer = AutoTokenizer.from_pretrained('eenzeenee/t5-base-korean-summarization')
28
 
29
+ prefix = "summarize: "
30
  sample = """
31
  μ•ˆλ…•ν•˜μ„Έμš”? 우리 (2ν•™λ…„)/(이 ν•™λ…„) μΉœκ΅¬λ“€ 우리 μΉœκ΅¬λ“€ 학ꡐ에 κ°€μ„œ μ§„μ§œ (2ν•™λ…„)/(이 ν•™λ…„) 이 되고 μ‹Άμ—ˆλŠ”λ° 학ꡐ에 λͺ» κ°€κ³  μžˆμ–΄μ„œ λ‹΅λ‹΅ν•˜μ£ ?
32
  κ·Έλž˜λ„ 우리 μΉœκ΅¬λ“€μ˜ μ•ˆμ „κ³Ό 건강이 μ΅œμš°μ„ μ΄λ‹ˆκΉŒμš” μ˜€λŠ˜λΆ€ν„° μ„ μƒλ‹˜μ΄λž‘ 맀일 맀일 κ΅­μ–΄ 여행을 λ– λ‚˜λ³΄λ„λ‘ ν•΄μš”.
 
44
  μ–΄λ–»κ²Œ μ—¬λŸ¬κ°€μ§€ λ°©λ²•μœΌλ‘œ μ½μ„κΉŒ 우리 곡뢀해 보도둝 ν•΄μš”. 였늘의 μ‹œ λ‚˜μ™€λΌ μ§œμž”/! μ‹œκ°€ λ‚˜μ™”μŠ΅λ‹ˆλ‹€ μ‹œμ˜ 제λͺ©μ΄ λ­”κ°€μš”? λ‹€νˆ° λ‚ μ΄μ—μš” λ‹€νˆ° λ‚ .
45
  λˆ„κ΅¬λž‘ λ‹€ν‰œλ‚˜ λ™μƒμ΄λž‘ λ‹€ν‰œλ‚˜ μ–Έλ‹ˆλž‘ μΉœκ΅¬λž‘? λˆ„κ΅¬λž‘ λ‹€ν‰œλŠ”μ§€ μ„ μƒλ‹˜μ΄ μ‹œλ₯Ό 읽어 쀄 ν…Œλ‹ˆκΉŒ ν•œλ²ˆ 생각을 해보도둝 ν•΄μš”."""
46
 
47
+ inputs = [prefix + sample]
48
 
49
 
50
+ inputs = tokenizer(inputs, max_length=512, truncation=True, return_tensors="pt")
51
  output = model.generate(**inputs, num_beams=3, do_sample=True, min_length=10, max_length=64)
52
  decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
53
  result = nltk.sent_tokenize(decoded_output.strip())[0]