Korean Sentiment Analysis with BERT

This project aims to perform sentiment analysis on Korean text using a pre-trained BERT model. The model has been fine-tuned on a sentiment analysis dataset to classify text into positive and negative sentiment categories.

Fine-tuning

The model is fine-tuned using a custom dataset with the following configuration:

Number of Labels: 2 (positive and negative)
Training Epochs: 1
Batch Size: 20
Optimizer: AdamW with weight decay

Dataset

The dataset used for fine-tuning the model consists of Korean text samples labeled with sentiment categories. The reviews in the dataset are scraped from the Google Play Store for the Kakao app. The dataset is split into three parts:

Training Set: Used to train the model.
Validation Set: Used to evaluate the model during training and tune hyperparameters.
Test Set: Used to evaluate the final performance of the model.

Data Preparation

The text data is tokenized using BertTokenizerFast with truncation and padding to ensure uniform input lengths.

Evaluation

The model is evaluated on the train, validation, and test sets using accuracy, F1 score, precision, and recall as metrics. Below are the results of the evaluation:

Evaluation Results

Set	loss	Accuracy	F1	Precision	Recall
Train	0.097011	0.967398	0.967397	0.967405	0.967398
Val	0.162700	0.945322	0.945321	0.945328	0.945322
Test	0.145638	0.948864	0.948864	0.948864	0.948864

Dilwolf
/

Kakao_app-kr_sentiment