File size: 2,739 Bytes
890e0dc e92823b 890e0dc e92823b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
license: apache-2.0
language:
- ko
pipeline_tag: text-classification
---
# formal_classifier
formal classifier or honorific classifier
## νκ΅μ΄ μ‘΄λλ§ λ°λ§ λΆλ₯κΈ°
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained("j5ng/kcbert-formal-classifier")
tokenizer = AutoTokenizer.from_pretrained('j5ng/kcbert-formal-classifier')
formal_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer)
print(formal_classifier("μ λ²μ κ΅μλκ»μ μλ£ κ°μ Έμ€λΌνλλ° κΈ°μ΅λ?"))
# [{'label': 'LABEL_0', 'score': 0.9999139308929443}]
```
***
### λ°μ΄ν° μ
μΆμ²
#### μ€λ§μΌκ²μ΄νΈ λ§ν¬ λ°μ΄ν° μ
(korean SmileStyle Dataset)
: https://github.com/smilegate-ai/korean_smile_style_dataset
#### AI νλΈ κ°μ± λν λ§λμΉ
: https://www.aihub.or.kr/
#### λ°μ΄ν°μ
λ€μ΄λ‘λ(AIνλΈλ μ§μ λ€μ΄λ‘λλ§ κ°λ₯)
```bash
wget https://raw.githubusercontent.com/smilegate-ai/korean_smile_style_dataset/main/smilestyle_dataset.tsv
```
### κ°λ° νκ²½
```bash
Python3.9
```
```bash
torch==1.13.1
transformers==4.26.0
pandas==1.5.3
emoji==2.2.0
soynlp==0.0.493
datasets==2.10.1
pandas==1.5.3
```
#### μ¬μ© λͺ¨λΈ
beomi/kcbert-base
- GitHub : https://github.com/Beomi/KcBERT
- HuggingFace : https://huggingface.co/beomi/kcbert-base
***
### μμ
|sentence|label|
|------|---|
|곡λΆλ₯Ό μ΄μ¬ν ν΄λ μ΄μ¬ν ν λ§νΌ μ±μ μ΄ μ λμ€μ§ μμ|0|
|μλ€μκ² λ³΄λ΄λ λ¬Έμλ₯Ό ν΅ν΄ κ΄κ³κ° ν볡λκΈΈ λ°λκ²μ|1|
|μ°Έ μ΄μ¬ν μ¬μ 보λμ΄ μμΌμλ€μ|1|
|λλ μ€μ μ’μν¨ μ΄λ² λ¬λΆν° μκ΅ κ° λ―|0|
|λ³ΈλΆμ₯λμ΄ λ΄κ° ν μ μλ μ
무λ₯Ό κ³μ μ£Όμ
μ νλ€μ΄|0|
### λΆν¬
|label|train|test|
|------|---|---|
|0|133,430|34,908|
|1|112,828|29,839|
***
κ²°κ³Ό
```
μ λ²μ κ΅μλκ»μ μλ£ κ°μ Έμ€λΌνμ
¨λλ° κΈ°μ΅λμΈμ? : μ‘΄λλ§μ
λλ€. ( νλ₯ 99.19% )
μ λ²μ κ΅μλκ»μ μλ£ κ°μ Έμ€λΌνλλ° κΈ°μ΅λ? : λ°λ§μ
λλ€. ( νλ₯ 92.86% )
```
***
## μΈμ©
```bash
@misc{SmilegateAI2022KoreanSmileStyleDataset,
title = {SmileStyle: Parallel Style-variant Corpus for Korean Multi-turn Chat Text Dataset},
author = {Seonghyun Kim},
year = {2022},
howpublished = {\url{https://github.com/smilegate-ai/korean_smile_style_dataset}},
}
```
```bash
@inproceedings{lee2020kcbert,
title={KcBERT: Korean Comments BERT},
author={Lee, Junbum},
booktitle={Proceedings of the 32nd Annual Conference on Human and Cognitive Language Technology},
pages={437--440},
year={2020}
}
```
|