distilbert-base-indonesian-finetuned-PRDECT-ID

This model is a fine-tuned version of cahya/distilbert-base-indonesian on [The PRDECT-ID Dataset] (https://www.kaggle.com/datasets/jocelyndumlao/prdect-id-indonesian-emotion-classification), it is a compilation of Indonesian product reviews that come with emotion and sentiment labels. These reviews were gathered from one of Indonesia's largest e-commerce platforms, Tokopedia.

Training and evaluation data

I split my dataframe df into training, validation, and testing sets (train_df, val_df, test_df) using the train_test_split function from sklearn.model_selection. I set the test size to 20% for the initial split and further divided the remaining data equally between validation and testing sets. This process ensures that each split (val_df and test_df) maintains the same class distribution as the original dataset (stratify=df['label']).

Training hyperparameters

The following hyperparameters were used during training:

num_train_epochs: 5
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
warmup_steps: 500
weight_decay: 0.01
logging_dir: ./logs
logging_steps: 10
eval_strategy: epoch
save_strategy: epoch

Training and Evaluation Results

The following table summarizes the training and validation loss over the epochs:

Epoch	Training Loss	Validation Loss
1	0.000100	0.000062
2	0.000000	0.000038
3	0.000000	0.000025
4	0.000000	0.000017
5	0.000000	0.000014

Train output:

global_step: 235
training_loss: 3.9409913424219185e-05
train_runtime: 44.6774
train_samples_per_second: 83.04
train_steps_per_second: 5.26
total_flos: 122954683514880.0
train_loss: 3.9409913424219185e-05
epoch: 5.0

Evaluation:

eval_loss: 1.3968576240586117e-05
eval_runtime: 0.3321
eval_samples_per_second: 270.973
eval_steps_per_second: 18.065
epoch: 5.0

Perplexity: 1.0000139686738017

These results indicate excellent model performance and generalization capabilities.

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1

albarpambagio
/

distilbert-base-indonesian-finetuned-PRDECT-ID

distilbert-base-indonesian-finetuned-PRDECT-ID

Training and evaluation data

Training hyperparameters

Training and Evaluation Results

Framework versions

Finetuned from

Dataset used to train albarpambagio/distilbert-base-indonesian-finetuned-PRDECT-ID

Evaluation results

distilbert-base-indonesian-finetuned-PRDECT-ID

Training and evaluation data

Training hyperparameters

Training and Evaluation Results

Framework versions

Finetuned from cahya/distilbert-base-indonesian

Dataset used to train albarpambagio/distilbert-base-indonesian-finetuned-PRDECT-ID

Evaluation results

Finetuned from