cardiffnlp
/

tweet-topic-21-single

Text Classification

Inference Endpoints

Model card Files Files and versions Community

tweet-topic-21-single / README.md

antypasd's picture

Create README.md

18c7863 over 2 years ago

|

1.81 kB

	# tweet-topic-21-single

	This is a roBERTa-base model trained on ~124M tweets from January 2018 to December 2021 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m)), and finetuned for single-label topic classification on tweets.
	The original roBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.

	- Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829).
	- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).

	<b>Labels</b>:
	0 -> arts_&_culture
	1 -> business_&_entrepreneurs
	2 -> pop_culture
	3 -> daily_life
	4 -> sports_&_gaming
	5 -> science_&_technology


	## Full classification example

	```python
	from transformers import AutoModelForSequenceClassification
	from transformers import AutoTokenizer
	import numpy as np
	from scipy.special import softmax


	MODEL = f"antypasd/tweet-topic-21-single"
	tokenizer = AutoTokenizer.from_pretrained(MODEL)

	# PT
	model = AutoModelForSequenceClassification.from_pretrained(MODEL)
	class_mapping = model.config.id2label

	text = "Tesla stock is on the rise!"
	encoded_input = tokenizer(text, return_tensors='pt')
	output = model(**encoded_input)

	output = model(**encoded_input)
	scores = output[0][0].detach().numpy()
	scores = softmax(scores)

	ranking = np.argsort(scores)
	ranking = ranking[::-1]
	for i in range(scores.shape[0]):
	l = class_mapping[ranking[i]]
	s = scores[ranking[i]]
	print(f"{i+1}) {l} {np.round(float(s), 4)}")

	```

	Output:

	```
	1) business_&_entrepreneurs 0.8361
	2) science_&_technology 0.0904
	3) pop_culture 0.0288
	4) daily_life 0.0178
	5) arts_&_culture 0.0137
	6) sports_&_gaming 0.0133
	```