Update README.md
Browse files
README.md
CHANGED
@@ -9,57 +9,26 @@ datasets:
|
|
9 |
license: mit
|
10 |
---
|
11 |
|
12 |
-
|
13 |
-
- [linkedin.com/in/arpanghoshal](https://www.linkedin.com/in/arpanghoshal)
|
14 |
|
|
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
Dataset labelled 58000 Reddit comments with 28 emotions
|
19 |
-
|
20 |
-
- admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral
|
21 |
-
|
22 |
-
|
23 |
-
## What is RoBERTa
|
24 |
-
|
25 |
-
RoBERTa builds on BERT’s language masking strategy and modifies key hyperparameters in BERT, including removing BERT’s next-sentence pretraining objective, and training with much larger mini-batches and learning rates. RoBERTa was also trained on an order of magnitude more data than BERT, for a longer amount of time. This allows RoBERTa representations to generalize even better to downstream tasks compared to BERT.
|
26 |
-
|
27 |
-
|
28 |
-
## Hyperparameters
|
29 |
-
|
30 |
-
| Parameter | |
|
31 |
-
| ----------------- | :---: |
|
32 |
-
| Learning rate | 5e-5 |
|
33 |
-
| Epochs | 10 |
|
34 |
-
| Max Seq Length | 50 |
|
35 |
-
| Batch size | 16 |
|
36 |
-
| Warmup Proportion | 0.1 |
|
37 |
-
| Epsilon | 1e-8 |
|
38 |
-
|
39 |
-
|
40 |
-
## Results
|
41 |
-
|
42 |
-
Best Result of `Macro F1` - 49.30%
|
43 |
-
|
44 |
## Usage
|
45 |
|
46 |
```python
|
47 |
|
48 |
from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, pipeline
|
49 |
|
50 |
-
tokenizer = RobertaTokenizerFast.from_pretrained("
|
51 |
-
model = TFRobertaForSequenceClassification.from_pretrained("
|
52 |
|
53 |
emotion = pipeline('sentiment-analysis',
|
54 |
-
model='
|
55 |
|
56 |
-
emotion_labels = emotion("
|
57 |
print(emotion_labels)
|
58 |
|
59 |
-
|
60 |
-
Output
|
61 |
-
|
62 |
-
```
|
63 |
-
[{'label': 'gratitude', 'score': 0.9964383244514465}]
|
64 |
```
|
65 |
|
|
|
9 |
license: mit
|
10 |
---
|
11 |
|
12 |
+
## What is the GoEmotions Dataset?
|
|
|
13 |
|
14 |
+
The dataset is comprised of 58000 Reddit comments with 28 emotions.
|
15 |
|
16 |
+
- admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Usage
|
18 |
|
19 |
```python
|
20 |
|
21 |
from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, pipeline
|
22 |
|
23 |
+
tokenizer = RobertaTokenizerFast.from_pretrained("cappuch/EmoRoBERTa_Retrain")
|
24 |
+
model = TFRobertaForSequenceClassification.from_pretrained("cappuch/EmoRoBERTa_Retrain")
|
25 |
|
26 |
emotion = pipeline('sentiment-analysis',
|
27 |
+
model='cappuch/EmoRoBERTa_Retrain')
|
28 |
|
29 |
+
emotion_labels = emotion("Hello!")
|
30 |
print(emotion_labels)
|
31 |
|
32 |
+
#[{'label': 'neutral', 'score': 0.9964383244514465}]
|
|
|
|
|
|
|
|
|
33 |
```
|
34 |
|