File size: 3,187 Bytes
55f3f2b
 
 
 
 
ec42e72
 
55f3f2b
c5ec3df
492bba8
 
 
 
 
 
 
55f3f2b
492bba8
55f3f2b
492bba8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c5ec3df
 
 
4378977
c5ec3df
4378977
492bba8
c5ec3df
ec42e72
25d4ceb
ec42e72
 
 
 
4378977
c81b0a7
 
 
1791945
c81b0a7
 
 
c5ec3df
 
 
51661a0
c5ec3df
4378977
da3e342
 
c5ec3df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: apache-2.0
tags:
- setfit
- sentence-transformers
- text-classification
pipeline_tag: text-classification
---

## General description of the model

Unlike a classical sentiment classifier, this model was built to measure the sentiment towards a particular entity on a particular pre-determined topic


```python
model = ....

text =  "I pity Facebook for their lack of commitment against global warming , I like google for its support of increased education"
# In the previous example we notice that depending on the type of entity (Google or Facebook) and depending on the type of to#pics (education or climate change) we have two types of sentiments

# Predict the sentiment towards Facebook (entity) on Climate change (topic) 
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook")
# sentiment = "negative

# Predict the sentiment towards Google (entity) on Education (topic) 
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook")
# Sentiment = "positive"

# Predict the sentiment towards Google (entity) on Climate Change (topic) 
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook")
# Sentiment = "neutral" / "not_found"

# Predict the sentiment towards Facebook (entity) on Education (topic) 
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook")
# Sentiment = "neutral" / "not_found"

```
## Training
This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for sentiment classification. 
The model has been trained using an efficient few-shot learning technique that involves:

1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned Sentence Transformer.
3. The Training data can be downloaded from [here](https://docs.google.com/spreadsheets/d/1BVDardwVs04ZWmc5_Eg62Lyr_w_OuXysQwhne8ErkoA/edit?usp=sharing)

## Usage and Inference
For a global overview of the pipeline used for inference please refer to this [colab notebook](https://colab.research.google.com/drive/1GgEGrhQZfA1pbcB9Zl0VtV7L5wXdh6vj?usp=sharing)

## Model Performance
The performances of the model on our internal test set are:
 * Accuracy: 0.68
 * Balanced_Accuracy: 0.45
 * MCC: 0.37
 * F1: 0.49

## Potential weakness of the model

* As the model has been trained on data of short length, it is difficult to predict how the model will behave on long texts
* Although the model is robust to typos and able to deal with synonyms, the entities and topics must be as explicit as possible. 
* The model may have difficulties to detect very abstract and complex topics, a fine tuning of the model can solve this problem
* The model may have difficulty in capturing elements that are very specific to a given context

## BibTeX entry and citation info

```bibtex
author = {HasiMichael, Solofo, Bruce, Sitwala},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Sentiment Classification toward Entity and Topics},
year = {2023/04},
version = {0}
```