PopEuroBERT-210m

Binary Populism Classifier for German Bundestag Speeches

Overview
Usage
Training Data
Training Procedure
Evaluation
Limitations
Ethical Considerations
License
Citation

Overview

This model is a fine-tuned version of EuroBERT-210m on the PopBERT dataset (sentence-level annotated German Bundestag speeches) for populist rhetoric classification. It predicts whether a given speech excerpt contains populist language.

Key Features:

Trained on German Bundestag speeches sentence-level annotated for populism.
Fine-tuned using 5-fold cross-validation.
Optimized with decision threshold tuning.

Usage

To use the model in Python:

import torch
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification


model_id = "przvl/PopEuroBERT-binary-210m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
  model_id, trust_remote_code=True
)

# define text to be predicted
text = (
    "Aber Ihnen fehlt eben der Mut, Ihnen fehlen die Visionen, um sich"
    "gegen die Konzerne und gegen die Lobbygruppen zur Wehr zu setzen."
)

inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# get classification probability
logits = outputs.logits
probs = torch.softmax(logits, dim=-1)  # shape [1, 2]
populist_prob = probs[0, 1].item()    # probability of class=1 (populist)

# use decision threshold 0.56
threshold = 0.56
label = "Populist" if populist_prob > threshold else "Neutral"
print(f"Predicted class: {label} (Confidence: {populist_prob:.2f})")

Predicted class: Populist (Confidence: 0.90)

Use decision threshold 0.56 for balanced performance.

Training Data

Dataset: PopBERT
- Sentence-level annotated German Bundestag speeches
- train/test: 7017/1758
Preprocessing:
- Converted labels to binary format (populist = 1, neutral = 0).
- Tokenized using EuroBERT tokenizer with a max length of 256 tokens.

Training Procedure

Base Model: EuroBERT-210M
Fine-tuning Approach:
- Used Hugging Face Trainer for training.
- Applied 5-fold cross-validation.
- Decision threshold tuning on aggregated predictions.

Hyperparameters

Parameter	Value
Learning Rate	`3e-05`
Weight Decay	`0.0`
Gradient Accumulation	`2`
Warmup Ratio	`0.1`
Epochs	`2`
Batch Size	`16`
Max Length	`256`

Mixed Precision (fp16): Used for efficiency on GPU.

Evaluation

For transparency, we compare this model with its larger variant (PopEuroBERT-610m), both trained and evaluated on the same dataset and splits.

Test Set Performance (Threshold = 0.5)

Model	Accuracy	Precision	Recall	F1 Score	Loss
210M (this)	75.99%	73.78%	80.66%	77.07%	0.4959
610M	80.26%	78.42%	83.50%	80.89%	0.4631

Test Set Performance (Optimized Threshold)

Model	Threshold	Accuracy	Precision	Recall	F1 Score
210M (this)	0.56	76.00%	76.00%	76.00%	76.00%
610M	0.43	79.81%	76.63%	85.78%	80.94%

While PopEuroBERT-210m performs well on the populism classification task, its larger variant shows stronger overall performance, especially in F1 score and recall.

Limitations

Domain Specificity: This model was trained on Bundestag speeches and may not generalize to all political discourse.
Threshold Sensitivity: The decision threshold (0.56) was optimized for this dataset but may need adjustment for other corpora.
Potential Bias: Political speech contains biases inherent in dataset labeling.

Ethical Considerations

Not suitable for high-stakes decision-making. This model is meant for research purposes in political discourse analysis.
Bias & Context Dependence: Populism is a complex concept. Automated detection should not replace human interpretation.
Transparent Use: Users should document and validate model outputs in their research.

License

Released under the Apache 2.0 License.

Citation

If you use this model or its methodology, please cite:

The original EuroBERT paper:

@misc{boizard2025eurobertscalingmultilingualencoders,
      title={EuroBERT: Scaling Multilingual Encoders for European Languages},
      author={Nicolas Boizard and Hippolyte Gisserot-Boukhlef and Duarte M. Alves and André Martins and Ayoub Hammal and Caio Corro and Céline Hudelot and Emmanuel Malherbe and Etienne Malaboeuf and Fanny Jourdan and Gabriel Hautreux and João Alves and Kevin El-Haddad and Manuel Faysse and Maxime Peyrard and Nuno M. Guerreiro and Patrick Fernandes and Ricardo Rei and Pierre Colombo},
      year={2025},
      eprint={2503.05500},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.05500}
}

The PopBERT dataset source:

@article{Erhard_Hanke_Remer_Falenska_Heiberger_2025,
      title={PopBERT. Detecting Populism and Its Host Ideologies in the German Bundestag},
      volume={33},
      DOI={10.1017/pan.2024.12},
      number={1},
      journal={Political Analysis},
      author={Erhard, Lukas and Hanke, Sara and Remer, Uwe and Falenska, Agnieszka and Heiberger, Raphael Heiko},
      year={2025},
      pages={1–17}
}

przvl
/

PopEuroBERT-binary-210m