File size: 904 Bytes
e389403
 
 
 
2d8f6b4
 
 
511b949
 
 
8758287
511b949
 
 
 
 
 
8758287
2d8f6b4
 
 
 
 
 
 
 
 
 
 
 
 
8758287
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
ν•œκ΅­μΈ 이름 인식 λͺ¨λΈ

kor-bert fine-tuning λͺ¨λΈ

자주 μ•ˆμ“°λŠ” ν•œκΈ€μ΄λ¦„ κΈ°μ€€μœΌλ‘œ
생성기λ₯Ό λ§Œλ“€μ–΄μ„œ, 16만개의 ν•œκΈ€ 이름을 생성 ν›„ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.

ex) μ•ˆλ…•ν•˜μ„Έμš”. μž„μ€€μ˜μž…λ‹ˆλ‹€. -> μ•ˆλ…•ν•˜μ„Έμš”. ***μž…λ‹ˆλ‹€.


```python
from transformers import BertTokenizerFast, BertForTokenClassification
from transformers import pipeline

model_name = 'joon09/kor-naver-ner-name'
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer)

ner('μ•ˆλ…•ν•˜μ„Έμš”. μž„μ€€μ˜μž…λ‹ˆλ‹€.',grouped_entities=True,aggregation_strategy='average')

[{'entity_group': 'PER',
  'score': 0.99999785,
  'word': 'μž„',
  'start': 7,
  'end': 8},
 {'entity_group': 'PER',
  'score': 0.82035744,
  'word': '##μ€€μ˜',
  'start': 8,
  'end': 10}]
```