NLP-reviews
This model is a fine-tuned version of bert-base-uncased on the Sentiment Labelled Sentences Data Set.
Model description
Given a sentence, this model will return the probabilities of it having a positive or negative sentiment, and the probabilities that it would be a review you would find from amazon.com, imdb.com, or yelp.com.
It is a multi-label classification model which is able to determine both the sentiment of text and a grouping the text belongs to.
Training and evaluation data
The data is obtained from the procured Sentiment Labelled Sentences Data Set.
Each entry has a sentiment score: 1 for positive or 0 for negative.
The data comes from one of three different websites:
- amazon.com
- imdb.com
- yelp.com
There are 500 positive and 500 negative sentences from each website, selected randomly from a larger dataset of reviews, and were chosen based on having clear positive or negative connotation.
This was split into a 90-10 train-test split for model training and evaluation.
The code used to train the model is at https://github.com/josephtkim/huggingface-sentiment-analysis.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 1.0 | 338 | 0.2270 |
0.2235 | 2.0 | 676 | 0.2737 |
0.0644 | 3.0 | 1014 | 0.3171 |
0.0644 | 4.0 | 1352 | 0.3511 |
0.0193 | 5.0 | 1690 | 0.3726 |
0.0119 | 6.0 | 2028 | 0.3638 |
0.0119 | 7.0 | 2366 | 0.3337 |
0.0043 | 8.0 | 2704 | 0.3424 |
0.0019 | 9.0 | 3042 | 0.3387 |
0.0019 | 10.0 | 3380 | 0.3467 |
Framework versions
- Transformers 4.29.1
- Pytorch 2.0.0+cu118
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 6