File size: 3,378 Bytes
c326a15
 
b77a245
c326a15
 
 
 
 
0831455
 
 
 
 
 
 
 
 
 
dd47cf8
0831455
 
dd47cf8
0831455
 
dd47cf8
0831455
 
dd47cf8
aa6b55a
 
 
c8a3dec
 
0d1e8cf
 
 
 
 
c326a15
 
 
 
 
 
 
0226120
 
c326a15
 
0d1e8cf
 
 
 
c326a15
 
 
8ca024a
c326a15
 
 
cc09139
c326a15
6dda3f6
8ca024a
c326a15
 
aa6b55a
c326a15
440cfd2
 
c326a15
 
a01ce05
aa6b55a
 
 
 
 
c326a15
 
 
 
 
 
 
 
 
 
aa6b55a
 
c326a15
 
 
 
 
 
 
 
 
 
 
9216947
 
 
 
 
 
 
 
c326a15
 
 
 
 
 
b77a245
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
library_name: transformers
license: apache-2.0
base_model: roberta-base
tags:
- generated_from_trainer
model-index:
- name: RoBERTa_Sentiment_Analysis
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: Tweets Hate Speech Detection
      type: tweets-hate-speech-detection/tweets_hate_speech_detection
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9613
    - name: Precision
      type: precision
      value: 0.9626
    - name: Recall
      type: recall
      value: 0.9613
    - name: F1
      type: f1
      value: 0.9619
language:
- en
pipeline_tag: text-classification
datasets:
- tweets-hate-speech-detection/tweets_hate_speech_detection
metrics:
- accuracy
- precision
- recall
- f1
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# RoBERTa_Sentiment_Analysis

This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on [Twitter Sentiment Analysis](https://www.kaggle.com/datasets/arkhoshghalb/twitter-sentiment-analysis-hatred-speech) dataset

It achieves the following results on the evaluation set:
- Loss: 0.0994
- Accuracy: 0.9613
- Precision: 0.9626
- Recall: 0.9613
- F1_score: 0.9619

## Model description

Fine-tuning performed on a pretrained RoBERTa model. The code can be found [here](https://github.com/atharva-m/Fine-tuning-RoBERTa-for-Sentiment-Analysis)

## Intended uses & limitations

The model is used to classify tweets as either being neutral or hate speech

'test.csv' of Twitter Sentiment Analysis is unused and unlabelled dataset. Contributions in [code](https://github.com/atharva-m/Fine-tuning-RoBERTa-for-Sentiment-Analysis) to utilize the dataset for evaluation are welcome!

## Training and evaluation data

'train.csv' of Twitter Sentiment Analysis is split into training and evaluation sets (80-20)

Fine-tuning was carried out on Google Colab's T4 GPU

## Training procedure

RobertaTokenizerFast is used for tokenizing preprocessed data

Pretrained RobertaForSequenceClassification is used as the classification model

Hyperparameters are defined in TrainingArguments and Trainer is used to train the model

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 50
- eval_batch_size: 50
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 5
- weight_decay : 0.0000001
- report_to="tensorboard"

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.1276        | 1.0   | 512  | 0.1116          |
| 0.1097        | 2.0   | 1024 | 0.0994          |
| 0.0662        | 3.0   | 1536 | 0.1165          |
| 0.0542        | 4.0   | 2048 | 0.1447          |
| 0.019         | 5.0   | 2560 | 0.1630          |

### Evaluation results

| Metric    | Value              |
|:---------:|:------------------:|
| Accuracy  | 0.9613639918661036 |
| Precision | 0.9626825763068382 |
| Recall    | 0.9613639918661036 |
| F1-score  | 0.9619595110644236 |

### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1