Young Ho Shin commited on
Commit
28e960f
โ€ข
1 Parent(s): 099d0f1

Small edits to article

Browse files
Files changed (1) hide show
  1. article.md +11 -12
article.md CHANGED
@@ -1,20 +1,19 @@
1
  ### ๊ธฐ์ˆ  ์„ค๋ช…
2
 
3
- ๋ฌธ์„œ์š”์•ฝ(text summarization)์€ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ(natural language processings) ๋ถ„์•ผ์— ์ƒ๋‹นํžˆ ์žฌ๋ฏธ์žˆ๋Š” ๊ณผ์ œ ์ค‘ ํ•œ๊ฐ€์ง€,
4
- ๊ทธ๋ฆฌ๊ณ  ์ผ์ƒ์ƒํ™œ์—๋„ ์ƒ๋‹นํžˆ ์œ ์šฉํ•œ๋ฐ๋„ ๋ถˆ๊ตฌํ•˜๊ณ 
5
- ํ•œ๊ตญ์–ด ๋ฌธ์„œ์š”์•ฝ์„ ํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์‰ฝ๊ฒŒ ์ฐพ์ง€ ๋ชปํ•ด ์—ฌ๊ธฐ์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‹œ๋„ํ•ด๋ดค์Šต๋‹ˆ๋‹ค.
6
 
7
- ๊ธฐ๋ณธ ๊ฐœ๋…์€ ๋ฌธ์„œ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ N๊ฐœ์˜ ๋ฌธ์žฅ์„ ์„ ํƒํ•ด์„œ ๊ฐœ์š”๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ถ”์ถœ์  ์š”์•ฝ(extractive summarization)์ธ๋ฐ์š”
8
- ๋ฌธ์žฅ์„ ์„ ํƒํ•˜๋Š” ๊ธฐ์ค€์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์ด ์žˆ์ฃ .
9
  ํ”ํžˆ ์‚ฌ์šฉํ•˜๋Š” TextRank ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์žฅ ์‚ฌ์ด์˜ ์ƒํ˜ธ๊ด€๊ณ„๋ฅผ ํ†ตํ•ด ๊ทธ์ค‘ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๋ฌธ์žฅ์„ ์ฐพ๋Š” graph-based ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.
10
 
11
  ์ด ํ”„๋กœ์ ํŠธ๋Š” ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ(sentence embedding)์„ ํ†ตํ•ด ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ๋ฒกํ„ฐ๋กœ ๋‚˜ํƒ€๋‚ด๊ณ ,
12
- ํด๋Ÿฌ์Šคํ„ฐ๋ง(clustering)์œผ๋กœ ๋น„์Šทํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฌธ์žฅ์„ ์ฐพ์•„,
13
- ์—ฌ๋Ÿฌ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์—์„œ ๊ฐ๊ฐ 1๊ฐœ์˜ ํ•ต์‹ฌ ๋ฌธ์žฅ์„ ์„ ํƒํ•ด์„œ ๊ฐœ์š”๋ฅผ ๋งŒ๋“œ๋Š” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
14
 
15
- ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ์„ ์ฐพ๋Š”๋ฐ์„œ [Sentence-BERT](https://www.sbert.net/)๋ผ๋Š” ๋ชจ๋ธ๋กœ ์‹œ๋„ํ–ˆ์—ˆ๋Š”๋ฐ์š” ๊ธฐ์กด ๋ชจ๋ธ์€ BERT ๊ธฐ๋ฐ˜์ด๋ผ์„œ
16
- ํ•œ๊ตญ์–ด ๋ฌธ์„œ์š”์•ฝ ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
17
- ๊ทธ๋ž˜์„œ ํ•œ๊ตญ์–ด ์ž„๋ฒ ๋”ฉ์— ํŠธ๋ ˆ์ด๋‹์ด๋œ [KoBERT](https://github.com/SKTBrain/KoBERT)๋ฅผ ๋’ท๋ฐ›์นจ์œผ๋กœ Sentence-BERT ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
18
 
19
  ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ์„ ๊ตฌํ•˜๊ณ  ๊ฐœ์š”๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์—์„œ ํŒŒ์ด์ฌ [Summarizer](https://github.com/dmmiller612/bert-extractive-summarizer) ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ณ 
20
  ๋ฌธ์žฅ๊ฒฝ๊ณ„์‹๋ณ„(sentence boundary detection) ๋“ฑ ์ „์ฒ˜๋ฆฌ ๋ฐ ํ›„์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด [Spacy](https://spacy.io/) ๋ผ์ด๋ฒ„๋ฆฌ๋ฅผ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
@@ -22,7 +21,7 @@
22
 
23
  ํ”„๋กœ์ ํŠธ์— ์•ž์œผ๋กœ ๊ฐœ์„ ํ•ด์•ผ ํ•  ์ ์ด ์•„์ง ๋งŽ์Šต๋‹ˆ๋‹ค.
24
  ๋งŒ์กฑ์Šค๋Ÿฌ์šด ํ•œ๊ธ€ ๋ฌธ์„œ ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹์ด ์—†์–ด fine-tuning ํ•™์Šต์ด ๋ถˆ๊ฐ€๋Šฅํ–ˆ๊ณ 
25
- ์ข…์ข… ๊ธ€์— ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๋ฌธ์žฅ์ด ๊ฐœ์š”์—์„œ ๋ˆ„๋ฝ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๊ณ  context๊ฐ€ ์—†์–ด ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์žฅ์ด ํฌํ•จ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์ฃ .
26
  ์–ธ์–ด๋ชจ๋ธ์˜ ํฌ๊ธฐ๊ฐ€ ์ƒ๋‹นํ•ด์„œ CPU ์„ฑ๋Šฅ๋„ ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ์•Š์•„
27
  ๋” ์ž‘์€ ๋ชจ๋ธ๋กœ ์‹คํ—˜ํ•ด๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์„๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
28
- ๋˜ํ•œ ๋„ค์ด๋ฒ„๋‰ด์Šค ๋ฟ๋งŒ์ด ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ์‚ฌ์ดํŠธ์—์„œ๋„ ๋ฌธ์„œ ๋ณธ๋ฌธ์„ ์ถ”์ถœํ•˜๋Š” web scraping ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์‚ฌ์šฉํ•˜๊ธฐ ํŽธํ•˜๊ฒ ์ฃ .
 
1
  ### ๊ธฐ์ˆ  ์„ค๋ช…
2
 
3
+ ๋ฌธ์„œ์š”์•ฝ(text summarization)์€ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ(natural language processings) ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•˜๊ณ  ํฅ๋ฏธ๋กœ์šด ๊ณผ์ œ ์ค‘ ํ•œ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.
4
+ ๊ทธ๋Ÿฐ๋ฐ๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ํ•œ๊ตญ์–ด ๋ฌธ์„œ์š”์•ฝ์„ ํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋ฅผ ์‰ฝ๊ฒŒ ์ฐพ์ง€ ๋ชปํ•ด ์—ฌ๊ธฐ์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‹œ๋„ํ•ด๋ดค์Šต๋‹ˆ๋‹ค.
 
5
 
6
+ ๊ธฐ๋ณธ ๊ฐœ๋…์€ ๋ฌธ์„œ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ N๊ฐœ์˜ ๋ฌธ์žฅ์„ ์„ ํƒํ•ด์„œ ๊ฐœ์š”๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ถ”์ถœ์  ์š”์•ฝ(extractive summarization)์ธ๋ฐ์š”,
7
+ ์—ฌ๊ธฐ์„œ ๋ฌธ์žฅ์„ ์„ ํƒํ•˜๋Š” ๊ธฐ์ค€์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์ด ์žˆ์ฃ .
8
  ํ”ํžˆ ์‚ฌ์šฉํ•˜๋Š” TextRank ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์žฅ ์‚ฌ์ด์˜ ์ƒํ˜ธ๊ด€๊ณ„๋ฅผ ํ†ตํ•ด ๊ทธ์ค‘ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๋ฌธ์žฅ์„ ์ฐพ๋Š” graph-based ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.
9
 
10
  ์ด ํ”„๋กœ์ ํŠธ๋Š” ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ(sentence embedding)์„ ํ†ตํ•ด ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ๋ฒกํ„ฐ๋กœ ๋‚˜ํƒ€๋‚ด๊ณ ,
11
+ ํด๋Ÿฌ์Šคํ„ฐ๋ง(clustering)์œผ๋กœ ๋น„์Šทํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฌธ์žฅ์„ ์„œ๋กœ ๋ชจ์•„,
12
+ ํด๋Ÿฌ์Šคํ„ฐ๋งˆ๋‹ค 1๊ฐœ์˜ ํ•ต์‹ฌ ๋ฌธ์žฅ์„ ์„ ํƒํ•˜์—ฌ ๊ฐœ์š”๋ฅผ ๋งŒ๋“œ๋Š” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
13
 
14
+ ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ์„ ์ฐพ๋Š”๋ฐ์„œ [Sentence-BERT](https://www.sbert.net/)๋ผ๋Š” ๋ชจ๋ธ๋กœ ์‹œ๋„ํ–ˆ์—ˆ์ง€๋งŒ ๊ธฐ์กด ๋ชจ๋ธ์€ BERT ๊ธฐ๋ฐ˜ ๋‹ค์ค‘์–ธ์–ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•จ์œผ๋กœ
15
+ ํ•œ๊ตญ์–ด ๋ฌธ์„œ์š”์•ฝ ํ…Œ์Šคํฌ์— ์ ํ•ฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
16
+ ๊ทธ๋ž˜์„œ ํ•œ๊ตญ์–ด ์ž„๋ฒ ๋”ฉ์— ํŠธ๋ ˆ์ด๋‹์ด๋œ [KoBERT](https://github.com/SKTBrain/KoBERT)๋ฅผ ๋’ท๋ฐ›์นจ์œผ๋กœ Sentence-BERT ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฌ์Šต๋‹ˆ๋‹ค.
17
 
18
  ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ์„ ๊ตฌํ•˜๊ณ  ๊ฐœ์š”๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์—์„œ ํŒŒ์ด์ฌ [Summarizer](https://github.com/dmmiller612/bert-extractive-summarizer) ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ณ 
19
  ๋ฌธ์žฅ๊ฒฝ๊ณ„์‹๋ณ„(sentence boundary detection) ๋“ฑ ์ „์ฒ˜๋ฆฌ ๋ฐ ํ›„์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด [Spacy](https://spacy.io/) ๋ผ์ด๋ฒ„๋ฆฌ๋ฅผ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
 
21
 
22
  ํ”„๋กœ์ ํŠธ์— ์•ž์œผ๋กœ ๊ฐœ์„ ํ•ด์•ผ ํ•  ์ ์ด ์•„์ง ๋งŽ์Šต๋‹ˆ๋‹ค.
23
  ๋งŒ์กฑ์Šค๋Ÿฌ์šด ํ•œ๊ธ€ ๋ฌธ์„œ ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹์ด ์—†์–ด fine-tuning ํ•™์Šต์ด ๋ถˆ๊ฐ€๋Šฅํ–ˆ๊ณ 
24
+ ์ข…์ข… ๊ธ€์— ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๋ฌธ์žฅ์ด ๊ฐœ์š”์—์„œ ๋ˆ„๋ฝ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๊ณ  ์•ž๋’ค ๋ฌธ๋งฅ์ด ์—†์–ด์„œ ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์žฅ์ด ํฌํ•จ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
25
  ์–ธ์–ด๋ชจ๋ธ์˜ ํฌ๊ธฐ๊ฐ€ ์ƒ๋‹นํ•ด์„œ CPU ์„ฑ๋Šฅ๋„ ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ์•Š์•„
26
  ๋” ์ž‘์€ ๋ชจ๋ธ๋กœ ์‹คํ—˜ํ•ด๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์„๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
27
+ ๋˜ํ•œ ์‚ฌ์šฉํ•˜๊ธฐ ๋”์šฑ ํŽธ๋ฆฌํ•˜๊ฒŒ ๋„ค์ด๋ฒ„๋‰ด์Šค ๋ฟ๋งŒ์ด ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ์‚ฌ์ดํŠธ์—์„œ๋„ ๋ณ„๋„์˜ web scraping ์ฝ”๋“œ ์—†์ด ๋ฌธ์„œ ๋ณธ๋ฌธ์„ ์ง€๋Šฅ์ ์œผ๋กœ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ๋„ ํฅ๋ฏธ๋กœ์šด ๊ณผ์ œ๊ฐ€ ๋ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.