|
--- |
|
license: apache-2.0 |
|
tags: |
|
- setfit |
|
- sentence-transformers |
|
- text-classification |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
## General description of the model |
|
|
|
Unlike a classical sentiment classifier, this model was built to measure the sentiment towards a particular entity on a particular pre-determined topic |
|
|
|
|
|
```python |
|
model = .... |
|
|
|
text = "I pity Facebook for their lack of commitment against global warming , I like google for its support of increased education" |
|
# In the previous example we notice that depending on the type of entity (Google or Facebook) and depending on the type of to#pics (education or climate change) we have two types of sentiments |
|
|
|
# Predict the sentiment towards Facebook (entity) on Climate change (topic) |
|
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook") |
|
# sentiment = "negative |
|
|
|
# Predict the sentiment towards Google (entity) on Education (topic) |
|
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook") |
|
# Sentiment = "positive" |
|
|
|
# Predict the sentiment towards Google (entity) on Climate Change (topic) |
|
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook") |
|
# Sentiment = "neutral" / "not_found" |
|
|
|
# Predict the sentiment towards Facebook (entity) on Education (topic) |
|
sentiment, probability = model.predict(text, topic="climate change", entity= "Facebook") |
|
# Sentiment = "neutral" / "not_found" |
|
|
|
``` |
|
## Training |
|
This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for sentiment classification. |
|
The model has been trained using an efficient few-shot learning technique that involves: |
|
|
|
1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. |
|
2. Training a classification head with features from the fine-tuned Sentence Transformer. |
|
3. The Training data can be downloaded from [here](https://docs.google.com/spreadsheets/d/1BVDardwVs04ZWmc5_Eg62Lyr_w_OuXysQwhne8ErkoA/edit?usp=sharing) |
|
|
|
## Usage and Inference |
|
For a global overview of the pipeline used for inference please refer to this [colab notebook](https://colab.research.google.com/drive/1GgEGrhQZfA1pbcB9Zl0VtV7L5wXdh6vj?usp=sharing) |
|
|
|
## Model Performance |
|
The performances of the model on our internal test set are: |
|
* Accuracy: 0.68 |
|
* Balanced_Accuracy: 0.45 |
|
* MCC: 0.37 |
|
* F1: 0.49 |
|
|
|
## Potential weakness of the model |
|
|
|
* As the model has been trained on data of short length, it is difficult to predict how the model will behave on long texts |
|
* Although the model is robust to typos and able to deal with synonyms, the entities and topics must be as explicit as possible. |
|
* The model may have difficulties to detect very abstract and complex topics, a fine tuning of the model can solve this problem |
|
* The model may have difficulty in capturing elements that are very specific to a given context |
|
|
|
## BibTeX entry and citation info |
|
|
|
```bibtex |
|
author = {HasiMichael, Solofo, Bruce, Sitwala}, |
|
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, |
|
title = {Sentiment Classification toward Entity and Topics}, |
|
year = {2023/04}, |
|
version = {0} |
|
``` |
|
|