YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
KPF BERT
์ฌ์ฉ๋ฐฉ๋ฒ
Step 1. Installation
python>3.6
์ด์ด์ผ ํจ
pip3 install torch>=1.4.0
pip3 install transformer>=4.9.2
Step 2. Load Tokenizer, Model
from transformers import BertModel, BertTokenizer
model_name_or_path = "LOCAL_MODEL_PATH" # Bert ๋ฐ์ด๋๋ฆฌ๊ฐ ํฌํจ๋ ๋๋ ํ ๋ฆฌ
model = BertModel.from_pretrained(model_name_or_path, add_pooling_layer=False)
tokenizer = BertTokenizer.from_pretrained(model_name_or_path
Step 3. Tokenizer
>>> text = "์ธ๋ก ์งํฅ์ฌ๋จ BERT ๋ชจ๋ธ์ ๊ณต๊ฐํฉ๋๋ค."
>>> tokenizer.tokenize(text)
['์ธ๋ก ', '##์งํฅ', '##์ฌ๋จ', 'BE', '##RT', '๋ชจ๋ธ', '##์', '๊ณต๊ฐ', '##ํฉ๋๋ค', '.']
>>> encoded_input = tokenizer(text)
>>> encoded_input
{'input_ids': [2, 7392, 24220, 16227, 28024, 21924, 7522, 4620, 7247, 15801, 518, 3],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Step 4. Model Inference
>>> import torch
>>> model.eval()
>>> pt_encoded_input = tokenizer(text, return_tensors="pt")
>>> model(**pt_encoded_input, return_dict=False)
(tensor([[[-4.1391e-01, 7.3169e-01, 1.1777e+00, ..., 1.2273e+00, -4.1275e-01, 2.4145e-03],
[ 1.6289e+00, -1.9552e-01, 1.6454e+00, ..., 2.5763e-01, 1.7823e-01, -7.6751e-01],
[ 7.4709e-01, -4.1524e-01, 3.0054e-01, ..., 1.1636e+00, -2.3667e-01, -1.0005e+00],
...,
[-7.9207e-01, -2.9005e-01, 1.7217e+00, ..., 1.5060e+00, -2.3975e+00, -4.3733e-01],
[-4.1402e-01, 7.3164e-01, 1.1777e+00, ..., 1.2273e+00, -4.1289e-01, 2.3552e-03],
[-4.1386e-01, 7.3167e-01, 1.1776e+00, ..., 1.2273e+00, -4.1259e-01, 2.5745e-03]]],
grad_fn=<NativeLayerNormBackward>), None)
์ด 5๊ฐ์ ๋ชจ๋ธ์ ๋ํด์ ํ๊ฐ ์์ ์ํ
- kpfBERT base (https://github.com/KPFBERT/kpfbert)
- KLUE BERT base (https://huggingface.co/klue/bert-base)
- ETRI BERT base (KorBERT, https://aiopen.etri.re.kr/service_dataset.php)
- KoBERT (https://github.com/SKTBrain/KoBERT)
- BERT base multilingual cased (https://huggingface.co/bert-base-multilingual-cased)
Sequence Classification ์ฑ๋ฅ ์ธก์ ๊ฒฐ๊ณผ ๋น๊ต (10/22/2021):
๊ตฌ๋ถ | NSMC | KLUE-NLI | KLUE-STS |
---|---|---|---|
๋ฐ์ดํฐ ํน์ง ๋ฐ ๊ท๊ฒฉ | ์ํ ๋ฆฌ๋ทฐ ๊ฐ์ ๋ถ์, ํ์ต 150,000 ๋ฌธ์ฅ, ํ๊ฐ: 50,000๋ฌธ์ฅ | ์์ฐ์ด ์ถ๋ก , ํ์ต: 24,998 ๋ฌธ์ฅ ํ๊ฐ: 3,000 ๋ฌธ์ฅ (dev์ ) | ๋ฌธ์ฅ ์๋ฏธ์ ์ ์ฌ๋ ์ธก์ , ํ์ต: 11,668 ๋ฌธ์ฅ ํ๊ฐ: 519 ๋ฌธ์ฅ (dev์ ) |
ํ๊ฐ๋ฐฉ๋ฒ | accuracy | accuracy | Pearson Correlation |
KPF BERT | 91.29% | 87.67% | 92.95% |
KLUE BERT | 90.62% | 81.33% | 91.14% |
KorBERT Tokenizer | 90.46% | 80.56% | 89.85% |
KoBERT | 89.92% | 79.53% | 86.17% |
BERT base multilingual | 87.33% | 73.30% | 85.66 % |
Question Answering ์ฑ๋ฅ ์ธก์ ๊ฒฐ๊ณผ ๋น๊ต (10/22/2021):
๊ตฌ๋ถ | KorQuAD v1 | KLUE-MRC |
---|---|---|
๋ฐ์ดํฐ ํน์ง ๋ฐ ๊ท๊ฒฉ | ๊ธฐ๊ณ๋ ํด, ํ์ต: 60,406 ๊ฑด ํ๊ฐ: 5,774 ๊ฑด (dev์ ) | ๊ธฐ๊ณ๋ ํด, ํ์ต: 17,554 ๊ฑด ํ๊ฐ: 5,841 ๊ฑด (dev์ ) |
ํ๊ฐ๋ฐฉ๋ฒ | Exact Match / F1 | Exact Match / Rouge W |
KPF BERT | 86.42% / 94.95% | 69.51 / 75.84% |
KLUE BERT | 83.84% / 93.23% | 61.91% / 68.38% |
KorBERT Tokenizer | 20.11% / 82.00% | 30.56% / 58.59% |
KoBERT | 16.85% / 71.36% | 28.56% / 42.06 % |
BERT base multilingual | 68.10% / 90.02% | 44.58% / 55.92% |
KPF BERT ํ์ฉ ์ฌ๋ก
KPFBERTSUM (https://github.com/KPFBERT/kpfbertsum)
- KpfBertSum์ Bert ์ฌ์ ํ์ต ๋ชจ๋ธ์ ์ด์ฉํ ํ ์คํธ ์์ฝ ๋ ผ๋ฌธ ๋ฐ ๋ชจ๋ธ์ธ PRESUMM๋ชจ๋ธ์ ์ฐธ์กฐํ์ฌ ํ๊ตญ์ด ๋ฌธ์ฅ์ ์์ฝ์ถ์ถ์ ๊ตฌํํ ํ๊ตญ์ด ์์ฝ ๋ชจ๋ธ์ด๋ค.
- ํ๊ตญ์ธ๋ก ์งํฅ์ฌ๋จ์์ ๊ตฌ์ถํ ๋ฐฉ๋ํ ๋ด์ค๊ธฐ์ฌ ์ฝํผ์ค๋ก ํ์ตํ kpfBERT๋ฅผ ์ด์ฉํ์ฌ ํนํ ๋ด์ค๊ธฐ์ฌ ์์ฝ์ ํนํ๋ ๋ชจ๋ธ์ด๋ค.
YouTube 'BERT๋ ๋ฌด์์ธ๊ฐ' ์ค๋ช ๋งํฌ https://youtu.be/Pj6563CAnKs
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.