Sensitive Content
Collection
Dataset and models associated with the detection of sensitive content in X/social media.
•
5 items
•
Updated
This is a RoBERTa-base model trained on 154M tweets until the end of December 2022 and finetuned for detecting sensitive content (multilabel classification) on the X-Sensitive dataset. The original Twitter-based RoBERTa model can be found here.
"id2label": {
"0": "conflictual",
"1": "profanity",
"2": "sex",
"3": "drugs",
"4": "selfharm",
"5": "spam"
"6": "not-sensitive"
}
from transformers import pipeline
pipe = pipeline(model='cardiffnlp/twitter-roberta-base-sensitive-multilabel')
text = "Call me today to earn some money mofos!"
pipe(text)
Output:
[[{'label': 'conflictual', 'score': 0.07463070750236511},
{'label': 'profanity', 'score': 0.9888035655021667},
{'label': 'sex', 'score': 0.0032050721347332},
{'label': 'drugs', 'score': 0.004522938746958971},
{'label': 'selfharm', 'score': 0.0036733713932335377},
{'label': 'spam', 'score': 0.007278479170054197},
{'label': 'not-sensitive', 'score': 0.00972921121865511}]]
@article{antypas2024sensitive,
title={Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation},
author={Antypas, Dimosthenis and Sen, Indira and Perez-Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco},
journal={arXiv preprint arXiv:2411.19832},
year={2024}
}