File size: 4,147 Bytes
ff638cb
 
 
 
 
 
 
 
 
 
0a5b46b
9a12443
0a5b46b
9a12443
0a5b46b
 
 
 
 
9a12443
 
 
0a5b46b
9a12443
 
 
 
 
 
 
 
 
 
 
 
 
 
0a5b46b
9a12443
 
 
 
c2ba529
 
9a12443
 
 
 
 
 
 
 
 
 
 
 
 
 
0a5b46b
9a12443
 
 
 
 
 
 
 
 
 
 
 
 
 
0a5b46b
9a12443
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9606382
 
 
 
 
 
338eb11
9606382
338eb11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
language:
- en
- ko
license: apache-2.0
datasets: AI-Hub
metrics:
- accuracy
pipeline_tag: text-classification
---
# 1. Introduction

## 1.1 examples

![examples](https://github.com/BurningFalls/algorithm-study/assets/30232837/596e5010-53b6-4598-8dd3-4ef7fc65e60e)

## 1.2 f1-score

![bert_accuracy](https://github.com/BurningFalls/algorithm-study/assets/30232837/58830340-aebe-4dc2-85fa-313138ac3020)

---

# 2. Requirements
```python
# my env
python==3.11.3
tensorflow==2.12.0
transformers==4.29.2

# maybe you need to
python>=3.6
tensorflow>=2.0
transformers>=4.0
```

---

# 3. Load
```python
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import TextClassificationPipeline

BERT_PARH = "burningfalls/my-fine-tuned-bert"

def load_bert():
    loaded_tokenizer = AutoTokenizer.from_pretrained(BERT_PATH)
    loaded_model = TFAutoModelForSequenceClassification.from_pretrained(BERT_PATH)

    text_classifier = TextClassificationPipeline(
        tokenizer=loaded_tokenizer,
        model=loaded_model,
        framework='tf',
        top_k=1
    )
```

---

# 4. Usage
```python
import re
import sentiments

def predict_sentiment(text):
    result = text_classifier(text)[0]
    feel_idx = int(re.sub(r'[^0-9]', '', result[0]['label']))
    feel = sentiments.Feel[feel_idx]["label"]

    return feel
```

---

# 5. sentiments.py
```python
Feel = [
    {"label": "κ°€λ‚œν•œ, λΆˆμš°ν•œ", "index": 0},
    {"label": "κ°μ‚¬ν•˜λŠ”", "index": 1},
    {"label": "κ±±μ •μŠ€λŸ¬μš΄", "index": 2},
    {"label": "고립된", "index": 3},
    {"label": "κ΄΄λ‘œμ›Œν•˜λŠ”", "index": 4},
    {"label": "κ΅¬μ—­μ§ˆ λ‚˜λŠ”", "index": 5},
    {"label": "기쁨", "index": 6},
    {"label": "λ‚™λ‹΄ν•œ", "index": 7},
    {"label": "λ‚¨μ˜ μ‹œμ„ μ„ μ˜μ‹ν•˜λŠ”", "index": 8},
    {"label": "λ…Έμ—¬μ›Œν•˜λŠ”", "index": 9},
    {"label": "눈물이 λ‚˜λŠ”", "index": 10},
    {"label": "λŠκΈ‹", "index": 11},
    {"label": "λ‹Ήν˜ΉμŠ€λŸ¬μš΄", "index": 12},
    {"label": "λ‹Ήν™©", "index": 13},
    {"label": "λ‘λ €μš΄", "index": 14},
    {"label": "λ§ˆλΉ„λœ", "index": 15},
    {"label": "만쑱슀러운", "index": 16},
    {"label": "방어적인", "index": 17},
    {"label": "λ°°μ‹ λ‹Ήν•œ", "index": 18},
    {"label": "버렀진", "index": 19},
    {"label": "λΆ€λ„λŸ¬μš΄", "index": 20},
    {"label": "λΆ„λ…Έ", "index": 21},
    {"label": "λΆˆμ•ˆ", "index": 22},
    {"label": "λΉ„ν†΅ν•œ", "index": 23},
    {"label": "μƒμ²˜", "index": 24},
    {"label": "μ„±κ°€μ‹ ", "index": 25},
    {"label": "슀트레슀 λ°›λŠ”", "index": 26},
    {"label": "μŠ¬ν””", "index": 27},
    {"label": "μ‹ λ’°ν•˜λŠ”", "index": 28},
    {"label": "신이 λ‚œ", "index": 29},
    {"label": "μ‹€λ§ν•œ", "index": 30},
    {"label": "μ•…μ˜μ μΈ", "index": 31},
    {"label": "μ•ˆλ‹¬ν•˜λŠ”", "index": 32},
    {"label": "μ•ˆλ„", "index": 33},
    {"label": "μ–΅μšΈν•œ", "index": 34},
    {"label": "열등감", "index": 35},
    {"label": "염세적인", "index": 36},
    {"label": "μ™Έλ‘œμš΄", "index": 37},
    {"label": "μš°μšΈν•œ", "index": 38},
    {"label": "μžμ‹ ν•˜λŠ”", "index": 39},
    {"label": "μ‘°μ‹¬μŠ€λŸ¬μš΄", "index": 40},
    {"label": "μ’Œμ ˆν•œ", "index": 41},
    {"label": "μ£„μ±…κ°μ˜", "index": 42},
    {"label": "μ§ˆνˆ¬ν•˜λŠ”", "index": 43},
    {"label": "μ§œμ¦λ‚΄λŠ”", "index": 44},
    {"label": "μ΄ˆμ‘°ν•œ", "index": 45},
    {"label": "좩격 받은", "index": 46},
    {"label": "μ·¨μ•½ν•œ", "index": 47},
    {"label": "νˆ΄νˆ΄λŒ€λŠ”", "index": 48},
    {"label": "νŽΈμ•ˆν•œ", "index": 49},
    {"label": "ν•œμ‹¬ν•œ", "index": 50},
    {"label": "혐였슀러운", "index": 51},
    {"label": "ν˜Όλž€μŠ€λŸ¬μš΄", "index": 52},
    {"label": "ν™˜λ©Έμ„ λŠλΌλŠ”", "index": 53},
    {"label": "회의적인", "index": 54},
    {"label": "ν›„νšŒλ˜λŠ”", "index": 55},
    {"label": "ν₯λΆ„", "index": 56},
    {"label": "ν¬μƒλœ", "index": 57},
]
```

---

# 6. Reference

* BERT: [klue/bert-base](https://huggingface.co/klue/bert-base)

* Dataset: [AI-Hub 감성 λŒ€ν™” λ§λ­‰μΉ˜](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=86)