File size: 2,805 Bytes
6f77ecf e03cf7b ba2c5b9 e03cf7b ba2c5b9 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf 307927c 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf d01caba ba2c5b9 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf e03cf7b 6f77ecf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# tweet-topic-19-multi
This is a RoBERTa-base model trained on ~90m tweets until the end of 2019 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2019-90m)) and finetuned for multi-label topic classification on a corpus of 11,267 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi).
The original RoBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.
- Reference Papers: [TimeLMs paper](https://arxiv.org/abs/2202.03829), [TweetTopic](https://arxiv.org/abs/2209.09824).
- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
<b>Labels</b>:
| <span style="font-weight:normal">0: arts_&_culture</span> | <span style="font-weight:normal">5: fashion_&_style</span> | <span style="font-weight:normal">10: learning_&_educational</span> | <span style="font-weight:normal">15: science_&_technology</span> |
|-----------------------------|---------------------|----------------------------|--------------------------|
| 1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports |
| 2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure |
| 3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life |
| 4: family | 9: gaming | 14: relationships | |
## Full classification example
```python
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import expit
MODEL = f"cardiffnlp/tweet-topic-19-multi"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class_mapping = model.config.id2label
text = "It is great to see athletes promoting awareness for climate change."
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)
scores = output[0][0].detach().numpy()
scores = expit(scores)
predictions = (scores >= 0.5) * 1
# TF
#tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
#class_mapping = tf_model.config.id2label
#text = "It is great to see athletes promoting awareness for climate change."
#tokens = tokenizer(text, return_tensors='tf')
#output = tf_model(**tokens)
#scores = output[0][0]
#scores = expit(scores)
#predictions = (scores >= 0.5) * 1
# Map to classes
for i in range(len(predictions)):
if predictions[i]:
print(class_mapping[i])
```
Output:
```
news_&_social_concern
sports
``` |