File size: 3,378 Bytes
c326a15 b77a245 c326a15 0831455 dd47cf8 0831455 dd47cf8 0831455 dd47cf8 0831455 dd47cf8 aa6b55a c8a3dec 0d1e8cf c326a15 0226120 c326a15 0d1e8cf c326a15 8ca024a c326a15 cc09139 c326a15 6dda3f6 8ca024a c326a15 aa6b55a c326a15 440cfd2 c326a15 a01ce05 aa6b55a c326a15 aa6b55a c326a15 9216947 c326a15 b77a245 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
library_name: transformers
license: apache-2.0
base_model: roberta-base
tags:
- generated_from_trainer
model-index:
- name: RoBERTa_Sentiment_Analysis
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Tweets Hate Speech Detection
type: tweets-hate-speech-detection/tweets_hate_speech_detection
metrics:
- name: Accuracy
type: accuracy
value: 0.9613
- name: Precision
type: precision
value: 0.9626
- name: Recall
type: recall
value: 0.9613
- name: F1
type: f1
value: 0.9619
language:
- en
pipeline_tag: text-classification
datasets:
- tweets-hate-speech-detection/tweets_hate_speech_detection
metrics:
- accuracy
- precision
- recall
- f1
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# RoBERTa_Sentiment_Analysis
This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on [Twitter Sentiment Analysis](https://www.kaggle.com/datasets/arkhoshghalb/twitter-sentiment-analysis-hatred-speech) dataset
It achieves the following results on the evaluation set:
- Loss: 0.0994
- Accuracy: 0.9613
- Precision: 0.9626
- Recall: 0.9613
- F1_score: 0.9619
## Model description
Fine-tuning performed on a pretrained RoBERTa model. The code can be found [here](https://github.com/atharva-m/Fine-tuning-RoBERTa-for-Sentiment-Analysis)
## Intended uses & limitations
The model is used to classify tweets as either being neutral or hate speech
'test.csv' of Twitter Sentiment Analysis is unused and unlabelled dataset. Contributions in [code](https://github.com/atharva-m/Fine-tuning-RoBERTa-for-Sentiment-Analysis) to utilize the dataset for evaluation are welcome!
## Training and evaluation data
'train.csv' of Twitter Sentiment Analysis is split into training and evaluation sets (80-20)
Fine-tuning was carried out on Google Colab's T4 GPU
## Training procedure
RobertaTokenizerFast is used for tokenizing preprocessed data
Pretrained RobertaForSequenceClassification is used as the classification model
Hyperparameters are defined in TrainingArguments and Trainer is used to train the model
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 50
- eval_batch_size: 50
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 5
- weight_decay : 0.0000001
- report_to="tensorboard"
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.1276 | 1.0 | 512 | 0.1116 |
| 0.1097 | 2.0 | 1024 | 0.0994 |
| 0.0662 | 3.0 | 1536 | 0.1165 |
| 0.0542 | 4.0 | 2048 | 0.1447 |
| 0.019 | 5.0 | 2560 | 0.1630 |
### Evaluation results
| Metric | Value |
|:---------:|:------------------:|
| Accuracy | 0.9613639918661036 |
| Precision | 0.9626825763068382 |
| Recall | 0.9613639918661036 |
| F1-score | 0.9619595110644236 |
### Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1 |