antypasd's picture
Update README.md
ba2c5b9
# tweet-topic-19-multi
This is a RoBERTa-base model trained on ~90m tweets until the end of 2019 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2019-90m)) and finetuned for multi-label topic classification on a corpus of 11,267 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi).
The original RoBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.
- Reference Papers: [TimeLMs paper](https://arxiv.org/abs/2202.03829), [TweetTopic](https://arxiv.org/abs/2209.09824).
- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
<b>Labels</b>:
| <span style="font-weight:normal">0: arts_&_culture</span> | <span style="font-weight:normal">5: fashion_&_style</span> | <span style="font-weight:normal">10: learning_&_educational</span> | <span style="font-weight:normal">15: science_&_technology</span> |
|-----------------------------|---------------------|----------------------------|--------------------------|
| 1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports |
| 2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure |
| 3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life |
| 4: family | 9: gaming | 14: relationships | |
## Full classification example
```python
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import expit
MODEL = f"cardiffnlp/tweet-topic-19-multi"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class_mapping = model.config.id2label
text = "It is great to see athletes promoting awareness for climate change."
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)
scores = output[0][0].detach().numpy()
scores = expit(scores)
predictions = (scores >= 0.5) * 1
# TF
#tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
#class_mapping = tf_model.config.id2label
#text = "It is great to see athletes promoting awareness for climate change."
#tokens = tokenizer(text, return_tensors='tf')
#output = tf_model(**tokens)
#scores = output[0][0]
#scores = expit(scores)
#predictions = (scores >= 0.5) * 1
# Map to classes
for i in range(len(predictions)):
if predictions[i]:
print(class_mapping[i])
```
Output:
```
news_&_social_concern
sports
```