Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# HebEMO - Emotion Recognition Model for Modern Hebrew
|
2 |
+
<img align="right" src="https://github.com/avichaychriqui/HeBERT/blob/main/data/heBERT_logo.png?raw=true" width="250">
|
3 |
+
|
4 |
+
HebEMO is a tool that detects polarity and extracts emotions from modern Hebrew User-Generated Content (UGC), which was trained on a unique Covid-19 related dataset that we collected and annotated.
|
5 |
+
|
6 |
+
HebEMO yielded a high performance of weighted average F1-score = 0.96 for polarity classification.
|
7 |
+
Emotion detection reached an F1-score of 0.78-0.97, with the exception of *surprise*, which the model failed to capture (F1 = 0.41). These results are better than the best-reported performance, even when compared to the English language.
|
8 |
+
|
9 |
+
## Emotion UGC Data Description
|
10 |
+
Our UGC data includes comments posted on news articles collected from 3 major Israeli news sites, between January 2020 to August 2020. The total size of the data is ~150 MB, including over 7 million words and 350K sentences.
|
11 |
+
~2000 sentences were annotated by crowd members (3-10 annotators per sentence) for overall sentiment (polarity) and [eight emotions](https://en.wikipedia.org/wiki/Robert_Plutchik#Plutchik's_wheel_of_emotions): anger, disgust, anticipation , fear, joy, sadness, surprise and trust.
|
12 |
+
The percentage of sentences in which each emotion appeared is found in the table below.
|
13 |
+
|
14 |
+
| | anger | disgust | expectation | fear | happy | sadness | surprise | trust | sentiment |
|
15 |
+
|------:|------:|--------:|------------:|-----:|------:|--------:|---------:|------:|-----------|
|
16 |
+
| **ratio** | 0.78 | 0.83 | 0.58 | 0.45 | 0.12 | 0.59 | 0.17 | 0.11 | 0.25 |
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
## Performance
|
21 |
+
### Emotion Recognition
|
22 |
+
| emotion | f1-score | precision | recall |
|
23 |
+
|-------------|----------|-----------|----------|
|
24 |
+
| anger | 0.96 | 0.99 | 0.93 |
|
25 |
+
| disgust | 0.97 | 0.98 | 0.96 |
|
26 |
+
|anticipation | 0.82 | 0.80 | 0.87 |
|
27 |
+
| fear | 0.79 | 0.88 | 0.72 |
|
28 |
+
| joy | 0.90 | 0.97 | 0.84 |
|
29 |
+
| sadness | 0.90 | 0.86 | 0.94 |
|
30 |
+
| surprise | 0.40 | 0.44 | 0.37 |
|
31 |
+
| trust | 0.83 | 0.86 | 0.80 |
|
32 |
+
|
33 |
+
*The above metrics is for positive class (meaning, the emotion is reflected in the text).*
|
34 |
+
|
35 |
+
### Sentiment (Polarity) Analysis
|
36 |
+
| | precision | recall | f1-score |
|
37 |
+
|--------------|-----------|--------|----------|
|
38 |
+
| neutral | 0.83 | 0.56 | 0.67 |
|
39 |
+
| positive | 0.96 | 0.92 | 0.94 |
|
40 |
+
| negative | 0.97 | 0.99 | 0.98 |
|
41 |
+
| accuracy | | | 0.97 |
|
42 |
+
| macro avg | 0.92 | 0.82 | 0.86 |
|
43 |
+
| weighted avg | 0.96 | 0.97 | 0.96 |
|
44 |
+
|
45 |
+
*Sentiment (polarity) analysis model is also available on AWS! for more information visit [AWS' git](https://github.com/aws-samples/aws-lambda-docker-serverless-inference/tree/main/hebert-sentiment-analysis-inference-docker-lambda)*
|
46 |
+
|
47 |
+
## How to use
|
48 |
+
|
49 |
+
### Emotion Recognition Model
|
50 |
+
An online model can be found at [huggingface spaces](https://huggingface.co/spaces/avichr/HebEMO_demo) or as [colab notebook](https://colab.research.google.com/drive/1Jw3gOWjwVMcZslu-ttXoNeD17lms1-ff?usp=sharing)
|
51 |
+
```
|
52 |
+
# !pip install pyplutchik==0.0.7
|
53 |
+
# !pip install transformers==4.14.1
|
54 |
+
|
55 |
+
!git clone https://github.com/avichaychriqui/HeBERT.git
|
56 |
+
from HeBERT.src.HebEMO import *
|
57 |
+
HebEMO_model = HebEMO()
|
58 |
+
|
59 |
+
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
|
60 |
+
# return analyzed pandas.DataFrame
|
61 |
+
|
62 |
+
hebEMO_df = HebEMO_model.hebemo(text='ืืืืื ืืคืื ืืืืืฉืจืื', plot=True)
|
63 |
+
```
|
64 |
+
<img src="https://github.com/avichaychriqui/HeBERT/blob/main/data/hebEMO1.png?raw=true" width="300" height="300" />
|
65 |
+
|
66 |
+
|
67 |
+
|
68 |
+
### For sentiment classification model (polarity ONLY):
|
69 |
+
from transformers import AutoTokenizer, AutoModel, pipeline
|
70 |
+
|
71 |
+
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
|
72 |
+
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
|
73 |
+
|
74 |
+
# how to use?
|
75 |
+
sentiment_analysis = pipeline(
|
76 |
+
"sentiment-analysis",
|
77 |
+
model="avichr/heBERT_sentiment_analysis",
|
78 |
+
tokenizer="avichr/heBERT_sentiment_analysis",
|
79 |
+
return_all_scores = True
|
80 |
+
)
|
81 |
+
|
82 |
+
sentiment_analysis('ืื ื ืืชืืื ืื ืืืืื ืืืจืืืช ืฆืืจืืื')
|
83 |
+
>>> [[{'label': 'neutral', 'score': 0.9978172183036804},
|
84 |
+
>>> {'label': 'positive', 'score': 0.0014792329166084528},
|
85 |
+
>>> {'label': 'negative', 'score': 0.0007035882445052266}]]
|
86 |
+
|
87 |
+
sentiment_analysis('ืงืคื ืื ืืขืื')
|
88 |
+
>>> [[{'label': 'neutral', 'score': 0.00047328314394690096},
|
89 |
+
>>> {'label': 'possitive', 'score': 0.9994067549705505},
|
90 |
+
>>> {'label': 'negetive', 'score': 0.00011996887042187154}]]
|
91 |
+
|
92 |
+
sentiment_analysis('ืื ื ืื ืืืื ืืช ืืขืืื')
|
93 |
+
>>> [[{'label': 'neutral', 'score': 9.214012970915064e-05},
|
94 |
+
>>> {'label': 'possitive', 'score': 8.876807987689972e-05},
|
95 |
+
>>> {'label': 'negetive', 'score': 0.9998190999031067}]]
|
96 |
+
|
97 |
+
|
98 |
+
|
99 |
+
The tool has been developed by The Coller Semitic Languages AI Lab in Coller School of Management at Tel Aviv University
|
100 |
+
|
101 |
+
|