X-Box Compulsion & Toxicity Index Classifier

Bayesian temporal phenotyping + 12-head text classification pipeline for detecting compulsive social media usage patterns and computing the Toxicity Index (TI) for political Twitter/X accounts.

Architecture

Temporal Model: Calibrated logistic regression on 5 compulsion signatures (burstiness, time-of-day entropy, Hawkes self-excitation, night intensity, weekend ratio).

Text Classification: 12 heads producing the per-tweet Toxicity Index.

Toxicity Index: TI = mean of 8 binary negative-behavior flags per tweet, bounded [0,1]. TI=0 means a clean informational tweet; TI=1 means every negative flag is active.

Validation

Compulsion Model (n=32, independent ground truth):

  • Spearman r = 0.912 (permutation p=0.001, bootstrap 95% CI [0.845, 0.965])
  • AUC = 0.933 (permutation p=0.003, bootstrap 95% CI [0.928, 1.000])
  • Repeated 5-fold (x20): AUC = 0.953 +/- 0.076
  • Brier score: 0.101

Text Classification Label Reliability (test-retest, n=75):

  • Ragebait: Pearson r=0.889, Cohen kappa=0.479
  • Tribal signal: Pearson r=0.862, Cohen kappa=0.730
  • Performative outrage: Pearson r=0.777, Cohen kappa=0.525

Per-Class Performance (12 Classification Heads)

Off-the-Shelf (CardiffNLP Twitter-RoBERTa, ~125M params each)

Head Model ID Classes Training Data
Sentiment cardiffnlp/twitter-roberta-base-sentiment-latest negative, neutral, positive TweetEval benchmark
Emotion cardiffnlp/twitter-roberta-base-emotion anger, joy, optimism, sadness TweetEval
Offensive cardiffnlp/twitter-roberta-base-offensive not-offensive, offensive TweetEval
Irony cardiffnlp/twitter-roberta-base-irony non-irony, irony TweetEval
Hate cardiffnlp/twitter-roberta-base-hate-multiclass-latest not-hate, + 6 subtypes 13 hate-speech datasets
Toxicity s-nlp/roberta_toxicity_classifier neutral, toxic 3 Jigsaw competitions (AUC 0.98)

CardiffNLP models are pre-trained on 124M tweets. See the TweetEval benchmark (Barbieri et al., 2020) for per-class F1/P/R on the standard evaluation sets.

Custom-Trained (SetFit, all-mpnet-base-v2 backbone, ~109M params each)

Trained on 4,121 LLM-labeled tweets from 14 accounts (7 Democrat, 7 Republican). Evaluated on 20% held-out test set.

Head F1 Precision Recall Training Examples Description
Ragebait 0.800 0.82 0.78 300 (150+150) Content designed to provoke outrage
Tribal signal 0.825 0.84 0.81 400 (200+200) Us-vs-them, in-group/out-group framing
Performative outrage 0.850 0.87 0.83 400 (200+200) Theatrical outrage vs genuine concern
Epistemic manipulation 0.800 0.81 0.79 300 (150+150) Cherry-picking, straw-manning, false equiv.
Engagement bait 0.800 0.83 0.77 400 (200+200) Polls, CTAs, rhetorical questions
Agency language 0.838 0.85 0.83 400 (200+200) Active/agentic (1) vs passive/victimhood (0)

Toxicity Index Components

The per-tweet Toxicity Index is computed as:

TI = mean(flag_offensive, flag_toxic, flag_negative_sentiment,
          flag_anger, flag_irony, flag_ragebait, flag_tribal,
          flag_performative)

Where each flag is binary (0 or 1) based on the corresponding classifier threshold. TI_senator = mean(TI) across all tweets in the archive.

Compulsion Signature Features

Feature Coefficient Description
Time-of-day entropy +1.258 Shannon entropy of hourly posting distribution (bits)
Hawkes n* +0.922 Self-excitation branching ratio
Burstiness B +0.837 Goh-Barabasi inter-event time parameter
Night intensity +0.584 Share of posts 00:00-05:59 UTC
Weekend ratio +0.204 Weekend/weekday posting rate ratio

Theoretical Framework

Inspired by Recovery Viability Theory (Kepner, White, & O'Neill, 2026):

  • Logit-bounded state space for natural [0,1] constraints
  • Cusp catastrophe dynamics for sudden behavioral transitions
  • Critical slowing down as early warning signals

Files

  • bayesian_model_results.json - Fitted model parameters
  • calibrated_model_v2.json - V2 validation with independent ground truth
  • cohort_v2_results.csv - 32-account ground truth cohort
  • cohort_signatures.csv - Ground truth compulsion signatures
  • setfit_*/ - Trained SetFit classifier checkpoints (6 models)
  • xbox/ - Pipeline source code

Citation

O'Neill, J., Cabanillas, J., Brooks, J., et al. (2026). Detecting Compulsive Social Media Usage Patterns in US Congressional Accounts: A Bayesian Temporal Phenotyping Approach. Manuscript in preparation for International Journal of Drug Policy.

Ethics

This methodology cannot and should not be used for clinical diagnosis. The Toxicity Index and compulsion probability are research instruments, not clinical assessments.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support