File size: 2,037 Bytes
3ebc4c5
 
 
 
 
 
 
 
 
 
819ef30
3ebc4c5
222085a
 
3ebc4c5
 
222085a
3ebc4c5
222085a
 
 
 
 
 
3ebc4c5
5acf520
3ebc4c5
222085a
3ebc4c5
222085a
 
3ebc4c5
222085a
 
 
 
 
 
3ebc4c5
222085a
 
 
3ebc4c5
222085a
 
 
 
 
3ebc4c5
222085a
 
 
 
 
3ebc4c5
222085a
3ebc4c5
222085a
3ebc4c5
222085a
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
tags:
- generated_from_keras_callback
model-index:
- name: XLM-T-Sent-Politics
  results: []
---

# XLM-T-Sent-Politics

This is an "extension" of the multilingual `twitter-xlm-roberta-base-sentiment` model ([model](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment), [original paper](https://arxiv.org/abs/2104.12250)) with a focus on sentiment from politicians' tweets. The original sentiment fine-tuning was done on 8 languages (Ar, En, Fr, De, Hi, It, Sp, Pt) but further training was done using tweets from Members of Parliament from UK (English), Spain (Spanish) and Greece (Greek). 

- Reference Paper: [Politics, Sentiment and Virality: A Large-Scale Multilingual Twitter Analysis in Greece, Spain and United Kingdom](https://arxiv.org/pdf/2202.00396.pdf). 
- Git Repo: [https://github.com/cardiffnlp/politics-and-virality-twitter](https://github.com/cardiffnlp/politics-and-virality-twitter).


## Full classification example

```python
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax

MODEL = f"cardiffnlp/xlm-twitter-politics-sentiment"

tokenizer = AutoTokenizer.from_pretrained(MODEL)

# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

text = "Good night 😊"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

# # TF
# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
# model.save_pretrained(MODEL)

# text = "Good night 😊"
# encoded_input = tokenizer(text, return_tensors='tf')
# output = model(encoded_input)
# scores = output[0][0].numpy()
# scores = softmax(scores)

# Print labels and scores
ranking = np.argsort(scores)
for i in range(scores.shape[0]):
    s = scores[ranking[i]]
    print(i, s)

```

Output: 

```
0 0.0048229103
1 0.03117284
2 0.9640044
```