Korean Sentiment Analysis with BERT
This project aims to perform sentiment analysis on Korean text using a pre-trained BERT model. The model has been fine-tuned on a sentiment analysis dataset to classify text into positive and negative sentiment categories.
Fine-tuning
The model is fine-tuned using a custom dataset with the following configuration:
- Number of Labels: 2 (positive and negative)
- Training Epochs: 1
- Batch Size: 20
- Optimizer: AdamW with weight decay
Dataset
The dataset used for fine-tuning the model consists of Korean text samples labeled with sentiment categories. The reviews in the dataset are scraped from the Google Play Store for the Kakao app. The dataset is split into three parts:
- Training Set: Used to train the model.
- Validation Set: Used to evaluate the model during training and tune hyperparameters.
- Test Set: Used to evaluate the final performance of the model.
Data Preparation
The text data is tokenized using BertTokenizerFast
with truncation and padding to ensure uniform input lengths.
Evaluation
The model is evaluated on the train, validation, and test sets using accuracy, F1 score, precision, and recall as metrics. Below are the results of the evaluation:
Evaluation Results
Set | loss | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|---|
Train | 0.097011 | 0.967398 | 0.967397 | 0.967405 | 0.967398 |
Val | 0.162700 | 0.945322 | 0.945321 | 0.945328 | 0.945322 |
Test | 0.145638 | 0.948864 | 0.948864 | 0.948864 | 0.948864 |
license: apache-2.0
- Downloads last month
- 143