Belnap Corpus Controversy Classifier (DistilBERT)

What this is

A binary classifier that predicts whether a debate proposition is high-controversy (informed debaters strongly disagree) vs non-high-controversy (medium or low disagreement). Fine-tuned from distilbert-base-uncased on the 110-record barissozudogru/belnap-debate-corpus.

Honest sizing

This model was trained on 88 training examples / 22 eval examples (stratified 80/20 split from a 110-record corpus). 110 records is small by ML standards — fine-tunes at this scale tend toward memorization rather than broad generalization. Results below are useful as a small-data baseline, not a production controversy detector.

Metrics on held-out 22-example eval set

Metric	Model	Baseline (always-predict-high)
Accuracy	0.864	0.727
F1 (macro)	0.790	n/a
F1 (high)	0.914	n/a

Lift above baseline: +13.6 percentage points.

Confusion matrix

              predicted
              non-high  high
actual non-high    3      3      (recall 50%)
actual high        0     16      (recall 100%)

Per-class

Class	Precision	Recall	F1	Support
non-high (0)	1.00	0.50	0.67	6
high (1)	0.84	1.00	0.91	16

How to read those numbers

The classifier is asymmetric: it never misses a high-controversy proposition (100% recall on high), but it's conservative when calling something non-high (only 50% recall, missing half of them). Practically:

Good for: flagging propositions that are likely high-controversy with high confidence
Less good for: definitively saying something is not controversial
Read the prediction as: "is this likely high-controversy? yes/maybe" rather than "is this high or low? clean binary"

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("barissozudogru/belnap-controversy-classifier")
tokenizer = AutoTokenizer.from_pretrained("barissozudogru/belnap-controversy-classifier")

proposition = "Frontier AI models should be open-sourced despite misuse risks."
inputs = tokenizer(proposition, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)[0]

print(f"P(high-controversy) = {probs[1]:.3f}")
print(f"P(non-high)         = {probs[0]:.3f}")
print(f"label: {model.config.id2label[int(torch.argmax(logits))]}")

Training details

Aspect	Value
Base model	`distilbert-base-uncased` (66M params)
Training data	88 propositions (stratified 80% of `belnap-debate-corpus`)
Eval data	22 propositions (stratified 20%)
Epochs	5
Batch size	8
Learning rate	3e-5
Optimizer	AdamW (default for `Trainer`)
Weight decay	0.01
Warmup ratio	0.1
Loss	Cross-entropy with class weights (`non-high` weighted higher to mitigate imbalance)
Class weights	non-high: 2.44, high: 0.63
Seed	42
Hardware	Apple Silicon (MPS backend)
Best-model selection	Best macro-F1 across 5 epoch evals

Full training script: train.py in this repo.

Limitations

Small training set (88 examples). Tends toward memorization; broader generalization is unverified. Use embeddings + leave-one-out cross-val for more rigorous evaluation.
Domain coverage matches the corpus: economics, bioethics, ethics, labor, education, technology policy, environment, free speech, animal ethics, political philosophy. Out-of-domain propositions (e.g., scientific consensus questions, legal-procedural debates) may behave differently.
controversy labels in the source corpus are author judgments, not measured agreement rates. The classifier learns to predict those judgments, not ground-truth controversy.
English only.
Stratified split, not leave-one-out — eval set may be optimistic.

Intended use

Pairing with the Belnap paraconsistent debate Space to pre-filter propositions worth running through the full debate pipeline
A small-data baseline for anyone evaluating controversy-detection approaches on the same corpus
Teaching example showing 110-record fine-tuning trade-offs honestly

Not for

Production-grade controversy scoring on arbitrary text
Legal, journalistic, or moderation decisions
Languages other than English

Citation

If you use this model, please cite the underlying corpus:

@misc{sozudogru2026belnapcorpus,
  author       = {Sozudogru, Baris},
  title        = {Belnap Real-Debate Corpus},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/datasets/barissozudogru/belnap-debate-corpus},
}

And the Belnap-Dunn foundations:

Belnap, N. D. (1977). A Useful Four-Valued Logic. In Modern Uses of Multiple-Valued Logic.
Dunn, J. M. (1976). Intuitive Semantics for First-Degree Entailments and Coupled Trees. Philosophical Studies 29(3).

Belnap real-debate corpus — the training dataset
Belnap paraconsistent logic visualizer — interactive four-valued logic
ADK Agent Playground — the debate adjudication framework this is extracted from

Downloads last month: 43

Safetensors

Model size

67M params

Tensor type

F32

Model tree for barissozudogru/belnap-controversy-classifier

Base model

distilbert/distilbert-base-uncased

Finetuned

(11879)

this model

Dataset used to train barissozudogru/belnap-controversy-classifier

Space using barissozudogru/belnap-controversy-classifier 1

Collection including barissozudogru/belnap-controversy-classifier

Paraconsistent debate stack

Collection

Four-valued (Belnap-Dunn) logic for multi-agent debate aggregation - interactive Space, 110-proposition corpus, and a DistilBERT classifier. • 3 items • Updated 14 days ago

barissozudogru
/

belnap-controversy-classifier