PaloBERT for Sentiment Analysis
A greek RoBERTa based model (PaloBERT: an updated version of palobert-base-greek-uncased-v1) fine-tuned for sentiment analysis.
Training data
The model is pre-trained on a corpus of 458,293 documents collected from greek social media (Twitter, Instagram, Facebook and YouTube). A RoBERTa tokenizer trained from scratch on the same corpus is also included. The fine-tuning process is done on a dataset of ~60,000 documents, also collected from greek social media.
The corpus as well as the annotated dataset have been provided by Palo LTD.
Requirements
pip install transformers
pip install torch
Pre-processing details
In order to use this model, the text needs to be pre-processed as follows:
- remove all greek diacritics
- convert to lowercase
- remove all punctuation
import re
import unicodedata
def preprocess(text, default_replace=""):
text = text.lower()
text = unicodedata.normalize('NFD',text).translate({ord('\N{COMBINING ACUTE ACCENT}'):None})
text = re.sub(r'[^\w\s]', default_replace, text)
return text
Load Model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("pchatz/palobert-base-greek-social-media-v2") #load PaloBERT pre-trained model
language_model = AutoModel.from_pretrained("pchatz/palobert-base-greek-social-media-v2")
Refer to GitHub code for details on ModelClass architecture
model = TheModelClass(*args, **kwargs) #load fine-tuned model as SentimentClassifier_v2
model.load_state_dict(torch.load(PATH))
model.eval()
You can use this sentiment analysis model directly on raw text:
#Example
class_names={0: 'neutral', 1:'positive', 2:'negative'}
text='οι εξετασεις ηταν πολυ καλες'
encoding=tokenizer(text,return_tensors='pt')
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
output = model(input_ids, attention_mask)
_,prediction = torch.max(output, dim=1)
print(f'sentiment : {class_names[prediction.item()]}') #positive
Evaluation
For detailed results refer to Thesis: 'Ανάλυση συναισθήματος κειμένου στα Ελληνικά με χρήση Δικτύων Μετασχηματιστών' (version - p2)
Author
Pavlina Chatziantoniou, Georgios Alexandridis and Athanasios Voulodimos
BibTeX entry and citation info
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623
@Article{info12080331,
AUTHOR = {Alexandridis, Georgios and Varlamis, Iraklis and Korovesis, Konstantinos and Caridakis, George and Tsantilas, Panagiotis},
TITLE = {A Survey on Sentiment Analysis and Opinion Mining in Greek Social Media},
JOURNAL = {Information},
VOLUME = {12},
YEAR = {2021},
NUMBER = {8},
ARTICLE-NUMBER = {331},
URL = {https://www.mdpi.com/2078-2489/12/8/331},
ISSN = {2078-2489},
DOI = {10.3390/info12080331}
}