---
language:
- en
- es
- ja
- el
widget:
- text: It is great to see athletes promoting awareness for climate change.
datasets:
- cardiffnlp/tweet_topic_multi
- cardiffnlp/tweet_topic_multilingual
license: mit
metrics:
- f1
pipeline_tag: text-classification
---

# tweet-topic-large-multilingual

This model is based on  [cardiffnlp/twitter-xlm-roberta-large-2022](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-large-2022) language model and isfinetuned for multi-label topic classification in English, Spanish, Japanese, and Greek.

The models is trained using [TweetTopic](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi) and [X-Topic](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multilingual) datasets (see main [EMNLP 2024 reference paper](https://arxiv.org/abs/2410.03075).


<b>Labels</b>: 


| <span style="font-weight:normal">0: arts_&_culture</span>           | <span style="font-weight:normal">5: fashion_&_style</span>   | <span style="font-weight:normal">10: learning_&_educational</span>  | <span style="font-weight:normal">15: science_&_technology</span>  |
|-----------------------------|---------------------|----------------------------|--------------------------|
| 1: business_&_entrepreneurs | 6: film_tv_&_video  | 11: music                  | 16: sports               |
| 2: celebrity_&_pop_culture  | 7: fitness_&_health | 12: news_&_social_concern  | 17: travel_&_adventure   |
| 3: diaries_&_daily_life     | 8: food_&_dining    | 13: other_hobbies          | 18: youth_&_student_life |
| 4: family                   | 9: gaming           | 14: relationships          |                          |


## Full classification example

```python
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import expit

    
MODEL = f"cardiffnlp/tweet-topic-large-multilingual"
tokenizer = AutoTokenizer.from_pretrained(MODEL)

# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class_mapping = model.config.id2label

text = "It is great to see athletes promoting awareness for climate change."
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)

scores = output[0][0].detach().numpy()
scores = expit(scores)
predictions = (scores >= 0.5) * 1


# TF
#tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
#class_mapping = tf_model.config.id2label
#text = "It is great to see athletes promoting awareness for climate change."
#tokens = tokenizer(text, return_tensors='tf')
#output = tf_model(**tokens)
#scores = output[0][0]
#scores = expit(scores)
#predictions = (scores >= 0.5) * 1

# Map to classes
for i in range(len(predictions)):
  if predictions[i]:
    print(class_mapping[i])

```
Output: 

```
news_&_social_concern
sports
```

## Results on X-Topic
|       | English | Spanish | Japanese | Greek |
|--------------|---------|---------|----------|-------|
| **Macro-F1** | 60.2    | 52.9    | 57.3     | 50.3  |
| **Micro-F1** | 66.3    | 67.0    | 61.4     | 73.0  |


## BibTeX entry and citation info

TBA