Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

tweet-topic-19-single

This is a roBERTa-base model trained on ~90m tweets until the end of 2019 (see here), and finetuned for single-label topic classification on a corpus of 6,997 tweets. The original roBERTa-base model can be found here and the original reference paper is TweetEval. This model is suitable for English.

Labels:

  • 0 -> arts_&_culture;
  • 1 -> business_&_entrepreneurs;
  • 2 -> pop_culture;
  • 3 -> daily_life;
  • 4 -> sports_&_gaming;
  • 5 -> science_&_technology

Full classification example

from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax

    
MODEL = f"cardiffnlp/tweet-topic-19-single"
tokenizer = AutoTokenizer.from_pretrained(MODEL)

# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class_mapping = model.config.id2label

text = "Tesla stock is on the rise!"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

scores = output[0][0].detach().numpy()
scores = softmax(scores)

# TF
#model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
#class_mapping = model.config.id2label
#text = "Tesla stock is on the rise!"
#encoded_input = tokenizer(text, return_tensors='tf')
#output = model(**encoded_input)
#scores = output[0][0]
#scores = softmax(scores)


ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = class_mapping[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")

Output:

1) business_&_entrepreneurs 0.8575
2) science_&_technology 0.0604
3) pop_culture 0.0295
4) daily_life 0.0217
5) sports_&_gaming 0.0154
6) arts_&_culture 0.0154
Downloads last month
11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cardiffnlp/tweet-topic-19-single