Narrative Classifier (RoBERTa-large, multi-label)

A multi-label text classifier that detects disinformation / propaganda narratives in news and social-media text. Given a piece of text, the model predicts which of 41 predefined narratives (spanning topics such as the war in Ukraine, migration, climate change, COVID-19 / vaccines, gender & LGBT+, anti-establishment / anti-EU / anti-NATO framings, etc.) are present.

The model was developed at the Polish-Japanese Academy of Information Technology (PJAIT / PJATK).

  • Architecture: RobertaNarrativeModel — a roberta-large encoder + a single linear classification head (narrative_head, 1024 → 41) applied to the <s> (CLS) token.
  • Task: multi-label classification (one input can carry several narratives at once).
  • Base model: FacebookAI/roberta-large
  • Parameters: ~0.4B · Precision: FP32 · Format: safetensors
  • Language: English

Note on the architecture. This repository uses a custom model class (RobertaNarrativeModel) whose weights are stored under the transformer.* and narrative_head.* prefixes. It therefore does not load directly with AutoModelForSequenceClassification. Use the self-contained loading code in the How to use section below.

Labels

The model outputs 41 labels. The full mapping is in narrative_labels.json / label_config.json.

Show all 41 narratives
ID Narrative
0 Abortion is evil/immoral/dangerous
1 Alternative treatments are more effective than conventional ones
2 Climate change is a hoax
3 Collapse of Western civilization is imminent
4 Conflict is a staged event prepared by outside forces
5 Contraception is against nature/dangerous/immoral
6 Conventional medicine is ineffective and corrupt
7 Conventional medicine is wrong about the causes of diseases
8 Elites manipulate elections
9 Elites want to take over the world
10 European Union is authoritarian
11 Feminism is a tool to destroy the natural order and traditional values
12 Global elites deliberately cause pandemics and diseases
13 Global warming does not exist/is not a serious threat
14 Governments fail to take proper action on migration crisis
15 Homosexuals are a threat
16 Humanity is not responsible for global warming
17 LGBT+ is a tool to destroy the natural order and traditional values
18 LGBT+ people are mentally ill
19 LGBT+ people are privileged
20 Media deliberately spreads lies
21 Migrants are a burden on the economy
22 Migrants are dangerous
23 Migrants are destroying local culture and breaking up local communities
24 Migration is a conspiracy of global elites
25 Most European countries are puppets of the West
26 NATO is authoritarian/warmongering
27 Official information is a tool to deceive citizens
28 Other
29 Russia is strong and winning the war
30 Sex education is a threat to children
31 Solutions to reduce human impact on environment and climate are a conspiracy
32 State and international institutions only serve to oppress citizens.
33 The West and their allies are immoral/hostile/ineffective
34 The energy crisis is artificially created
35 Transgender people are a threat
36 Ukraine is an evil, aggressive and dangerous country
37 Ukrainian refugees are a danger/burden
38 Vaccines are dangerous/ineffective/immoral
39 Western elites want to destroy the natural order and traditional values
40 other

How to use

import json
import torch
from torch import nn
from transformers import AutoTokenizer, AutoConfig, RobertaModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

REPO_ID = "pjait/narrative_classifier"


class RobertaNarrativeModel(nn.Module):
    """roberta-large encoder + a linear head over the <s> (CLS) token."""

    def __init__(self, config, num_labels):
        super().__init__()
        self.transformer = RobertaModel(config, add_pooling_layer=False)
        self.narrative_head = nn.Linear(config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None):
        out = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        cls = out.last_hidden_state[:, 0]          # <s> token representation
        return self.narrative_head(cls)            # raw logits (multi-label)


# --- load config, labels and weights ---------------------------------------
config = AutoConfig.from_pretrained(REPO_ID)
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

with open(hf_hub_download(REPO_ID, "narrative_labels.json")) as f:
    labels = json.load(f)
id2narrative = {int(k): v for k, v in labels["id2narrative"].items()}
num_labels = labels["num_labels"]

model = RobertaNarrativeModel(config, num_labels)
state_dict = load_file(hf_hub_download(REPO_ID, "model.safetensors"))
model.load_state_dict(state_dict)
model.eval()

# --- inference --------------------------------------------------------------
text = "The vaccines were rushed and are far more dangerous than the virus itself."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)
    probs = torch.sigmoid(logits)[0]              # multi-label -> sigmoid

THRESHOLD = 0.5
predicted = [(id2narrative[i], float(p)) for i, p in enumerate(probs) if p >= THRESHOLD]
print(sorted(predicted, key=lambda x: -x[1]))

THRESHOLD controls precision/recall trade-off; tune it on your own validation data.

Evaluation

Metrics from metrics.txt (evaluation split, epoch 3):

Metric Value
Micro F1 0.494
Macro F1 0.185
Precision 0.700
Recall 0.382
Subset accuracy 0.787
Eval loss 0.023

The gap between micro and macro F1, together with high precision but lower recall, indicates the model is conservative and performs unevenly across narratives — likely better on well-represented narratives and weaker on rare ones. Treat predictions as a decision-support signal, not ground truth, and calibrate the threshold for your use case.

Intended use & limitations

Intended use. Research and analysis of disinformation/propaganda narratives in English-language media; content moderation triage; media-monitoring dashboards; academic studies of narrative spread.

Out of scope / cautions.

  • The model identifies whether text expresses or discusses a narrative; it does not establish truth, intent, or that the author endorses the narrative (quotation, debunking and reporting can trigger labels).
  • Trained on English; performance on other languages is not guaranteed.
  • Macro F1 is low — rare narratives are unreliable. Do not use for automated, consequential decisions about individuals without human review.
  • Sensitive topics (health, politics, gender, migration). Outputs can reflect biases in the training data. Human oversight is required for any deployment.

Training

  • Base model: FacebookAI/roberta-large fine-tuned for multi-label narrative classification.
  • Epochs: 3 (see training_args.bin for the full TrainingArguments).
  • Objective: multi-label classification (sigmoid + binary cross-entropy over 41 narratives).

Citation

If you use this model, please cite the Polish-Japanese Academy of Information Technology (PJAIT) and the author. (Add the relevant paper / BibTeX here.)

@misc{narrative_classifier_pjait,
  title  = {Narrative Classifier (RoBERTa-large, multi-label)},
  author = {Sosnowski, Witold},
  howpublished = {\url{https://huggingface.co/pjait/narrative_classifier}},
  note   = {Polish-Japanese Academy of Information Technology (PJAIT)},
  year   = {2025}
}
Downloads last month
26
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pjait/narrative_classifier

Finetuned
(466)
this model