Opposition Detector

Binary classifier for parliamentary sentences: does a sentence express Opposition toward the European Union (label 1), or does it not (label 0, covering both Neutral and Support stances)?

This is the first stage of a two-step stance-detection cascade. A sentence is first passed through this Opposition detector; if it is classified as Non-Opposition, it is then passed to a separate Support detector (Support vs Neutral). Both stages use a 0.5 decision threshold.

Fine-tuned from jhu-clsp/mmBERT-base on hand-annotated parliamentary speeches from AUS, CZE, DEU, DNK, ESP, GBR, NLD, and SWE.

Labels

  • 0 โ€” Non-Opposition (Neutral or Support)
  • 1 โ€” Opposition

Training data

  • Source: hand-annotated parliamentary sentences labelled Neutral, Support, or Opposition.
  • Binarised for this model as Opposition vs the other two classes.
  • File: Stance_Retrain_undersampled.csv (undersampled to address class imbalance).
  • Split: leakage-safe StratifiedGroupKFold (n_splits=10) on country ร— speech_ID, so no speech appears in more than one fold. Realised allocation: 8 folds train / 1 fold val / 1 fold test (~80/10/10). The Opposition and Support detectors share the same underlying stance split for consistent cascade evaluation.

Hyperparameters

  • Base model: jhu-clsp/mmBERT-base
  • Max sequence length: 320
  • Learning rate: 1.5e-05
  • Epochs: 3
  • Batch size: 16 (with gradient accumulation if large model)
  • Warmup ratio: 0.2
  • Weight decay: 0.05
  • LR scheduler: cosine
  • Optimizer: AdamW (HF Trainer default)
  • Mixed precision: fp16
  • Early stopping patience: 2 (monitoring f1_positive on val)
  • Class weights: balanced (sklearn compute_class_weight)
  • Focal loss: disabled (plain weighted cross-entropy)
  • Random seed: 123
  • Model selection: best checkpoint by validation f1_positive (minority-class F1)

Input format

Sentence-only input (no surrounding context window). Truncation to 320 tokens.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LBenoit/opposition-detector-mmbert")
mdl = AutoModelForSequenceClassification.from_pretrained("LBenoit/opposition-detector-mmbert")

text = "This directive from Brussels undermines our national sovereignty."
enc  = tok(text, truncation=True, max_length=320, return_tensors="pt")
with torch.no_grad():
    prob_opp = torch.softmax(mdl(**enc).logits, dim=-1)[0, 1].item()
print("P(Opposition) =", prob_opp)

Intended use

Research on parliamentary stance toward the EU. Designed to be used as the first stage of an Opposition โ†’ Support cascade for full 3-way stance classification (Neutral / Support / Opposition). Outputs reflect the training corpus and annotation scheme; downstream prevalence estimates should ideally be calibrated against a base-rate-representative sample.

Limitations

  • Trained on parliamentary register; performance on social media, journalism, or other domains is not guaranteed.
  • Coverage limited to the eight countries listed above; generalisation to other parliaments is untested.
  • Sentence-level only; longer-range discourse context is not modelled.
Downloads last month
15
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LBenoit/opposition-detector-mmbert

Finetuned
(108)
this model