Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

KPF BERT

์‚ฌ์šฉ๋ฐฉ๋ฒ•

Step 1. Installation

python>3.6 ์ด์–ด์•ผ ํ•จ

pip3 install torch>=1.4.0
pip3 install transformer>=4.9.2

Step 2. Load Tokenizer, Model

from transformers import BertModel, BertTokenizer

model_name_or_path = "LOCAL_MODEL_PATH"  # Bert ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ํฌํ•จ๋œ ๋””๋ ‰ํ† ๋ฆฌ

model = BertModel.from_pretrained(model_name_or_path, add_pooling_layer=False)
tokenizer = BertTokenizer.from_pretrained(model_name_or_path

Step 3. Tokenizer

>>> text = "์–ธ๋ก ์ง„ํฅ์žฌ๋‹จ BERT ๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค."
>>> tokenizer.tokenize(text)
['์–ธ๋ก ', '##์ง„ํฅ', '##์žฌ๋‹จ', 'BE', '##RT', '๋ชจ๋ธ', '##์„', '๊ณต๊ฐœ', '##ํ•ฉ๋‹ˆ๋‹ค', '.']
>>> encoded_input = tokenizer(text)
>>> encoded_input
{'input_ids': [2, 7392, 24220, 16227, 28024, 21924, 7522, 4620, 7247, 15801, 518, 3],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

Step 4. Model Inference

>>> import torch
>>> model.eval()
>>> pt_encoded_input = tokenizer(text, return_tensors="pt")
>>> model(**pt_encoded_input, return_dict=False)
(tensor([[[-4.1391e-01,  7.3169e-01,  1.1777e+00,  ...,  1.2273e+00, -4.1275e-01,  2.4145e-03],
          [ 1.6289e+00, -1.9552e-01,  1.6454e+00,  ...,  2.5763e-01, 1.7823e-01, -7.6751e-01],
          [ 7.4709e-01, -4.1524e-01,  3.0054e-01,  ...,  1.1636e+00, -2.3667e-01, -1.0005e+00],
          ...,
          [-7.9207e-01, -2.9005e-01,  1.7217e+00,  ...,  1.5060e+00, -2.3975e+00, -4.3733e-01],
          [-4.1402e-01,  7.3164e-01,  1.1777e+00,  ...,  1.2273e+00, -4.1289e-01,  2.3552e-03],
          [-4.1386e-01,  7.3167e-01,  1.1776e+00,  ...,  1.2273e+00, -4.1259e-01,  2.5745e-03]]],
          grad_fn=<NativeLayerNormBackward>), None)

์ด 5๊ฐœ์˜ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ ์ž‘์—… ์ˆ˜ํ–‰

Sequence Classification ์„ฑ๋Šฅ ์ธก์ • ๊ฒฐ๊ณผ ๋น„๊ต (10/22/2021):

๊ตฌ๋ถ„ NSMC KLUE-NLI KLUE-STS
๋ฐ์ดํ„ฐ ํŠน์ง• ๋ฐ ๊ทœ๊ฒฉ ์˜ํ™” ๋ฆฌ๋ทฐ ๊ฐ์  ๋ถ„์„, ํ•™์Šต 150,000 ๋ฌธ์žฅ, ํ‰๊ฐ€: 50,000๋ฌธ์žฅ ์ž์—ฐ์–ด ์ถ”๋ก , ํ•™์Šต: 24,998 ๋ฌธ์žฅ ํ‰๊ฐ€: 3,000 ๋ฌธ์žฅ (dev์…‹) ๋ฌธ์žฅ ์˜๋ฏธ์  ์œ ์‚ฌ๋„ ์ธก์ •, ํ•™์Šต: 11,668 ๋ฌธ์žฅ ํ‰๊ฐ€: 519 ๋ฌธ์žฅ (dev์…‹)
ํ‰๊ฐ€๋ฐฉ๋ฒ• accuracy accuracy Pearson Correlation
KPF BERT 91.29% 87.67% 92.95%
KLUE BERT 90.62% 81.33% 91.14%
KorBERT Tokenizer 90.46% 80.56% 89.85%
KoBERT 89.92% 79.53% 86.17%
BERT base multilingual 87.33% 73.30% 85.66 %

Question Answering ์„ฑ๋Šฅ ์ธก์ • ๊ฒฐ๊ณผ ๋น„๊ต (10/22/2021):

๊ตฌ๋ถ„ KorQuAD v1 KLUE-MRC
๋ฐ์ดํ„ฐ ํŠน์ง• ๋ฐ ๊ทœ๊ฒฉ ๊ธฐ๊ณ„๋…ํ•ด, ํ•™์Šต: 60,406 ๊ฑด ํ‰๊ฐ€: 5,774 ๊ฑด (dev์…‹) ๊ธฐ๊ณ„๋…ํ•ด, ํ•™์Šต: 17,554 ๊ฑด ํ‰๊ฐ€: 5,841 ๊ฑด (dev์…‹)
ํ‰๊ฐ€๋ฐฉ๋ฒ• Exact Match / F1 Exact Match / Rouge W
KPF BERT 86.42% / 94.95% 69.51 / 75.84%
KLUE BERT 83.84% / 93.23% 61.91% / 68.38%
KorBERT Tokenizer 20.11% / 82.00% 30.56% / 58.59%
KoBERT 16.85% / 71.36% 28.56% / 42.06 %
BERT base multilingual 68.10% / 90.02% 44.58% / 55.92%

KPF BERT ํ™œ์šฉ ์‚ฌ๋ก€

  • KPFBERTSUM (https://github.com/KPFBERT/kpfbertsum)

    • KpfBertSum์€ Bert ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ์ด์šฉํ•œ ํ…์ŠคํŠธ ์š”์•ฝ ๋…ผ๋ฌธ ๋ฐ ๋ชจ๋ธ์ธ PRESUMM๋ชจ๋ธ์„ ์ฐธ์กฐํ•˜์—ฌ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์˜ ์š”์•ฝ์ถ”์ถœ์„ ๊ตฌํ˜„ํ•œ ํ•œ๊ตญ์–ด ์š”์•ฝ ๋ชจ๋ธ์ด๋‹ค.
    • ํ•œ๊ตญ์–ธ๋ก ์ง„ํฅ์žฌ๋‹จ์—์„œ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋Œ€ํ•œ ๋‰ด์Šค๊ธฐ์‚ฌ ์ฝ”ํผ์Šค๋กœ ํ•™์Šตํ•œ kpfBERT๋ฅผ ์ด์šฉํ•˜์—ฌ ํŠนํžˆ ๋‰ด์Šค๊ธฐ์‚ฌ ์š”์•ฝ์— ํŠนํ™”๋œ ๋ชจ๋ธ์ด๋‹ค.
  • YouTube 'BERT๋ž€ ๋ฌด์—‡์ธ๊ฐ€' ์„ค๋ช… ๋งํฌ https://youtu.be/Pj6563CAnKs

Downloads last month
0
Safetensors
Model size
108M params
Tensor type
F32
ยท