--- tags: - generated_from_keras_callback model-index: - name: XLM-T-Sent-Politics results: [] --- # XLM-T-Sent-Politics This is an "extension" of the multilingual `twitter-xlm-roberta-base-sentiment` model ([model](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment), [original paper](https://arxiv.org/abs/2104.12250)) with a focus on sentiment from politicians' tweets. The original sentiment fine-tuning was done on 8 languages (Ar, En, Fr, De, Hi, It, Sp, Pt) but further training was done using tweets from Members of Parliament from UK (English), Spain (Spanish) and Greece (Greek). - Reference Paper: [Politics, Sentiment and Virality: A Large-Scale Multilingual Twitter Analysis in Greece, Spain and United Kingdom](https://arxiv.org/pdf/2202.00396.pdf). - Git Repo: [https://github.com/cardiffnlp/politics-and-virality-twitter](https://github.com/cardiffnlp/politics-and-virality-twitter). ## Full classification example ```python from transformers import AutoModelForSequenceClassification from transformers import TFAutoModelForSequenceClassification from transformers import AutoTokenizer import numpy as np from scipy.special import softmax MODEL = f"cardiffnlp/xlm-twitter-politics-sentiment" tokenizer = AutoTokenizer.from_pretrained(MODEL) # PT model = AutoModelForSequenceClassification.from_pretrained(MODEL) text = "Good night 😊" text = preprocess(text) encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores) # # TF # model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) # model.save_pretrained(MODEL) # text = "Good night 😊" # encoded_input = tokenizer(text, return_tensors='tf') # output = model(encoded_input) # scores = output[0][0].numpy() # scores = softmax(scores) # Print labels and scores ranking = np.argsort(scores) for i in range(scores.shape[0]): s = scores[ranking[i]] print(i, s) ``` Output: ``` 0 0.0048229103 1 0.03117284 2 0.9640044 ```