Narrative Classifier (RoBERTa-large, multi-label)

A multi-label text classifier that detects disinformation / propaganda narratives in news and social-media text. Given a piece of text, the model predicts which of 41 predefined narratives (spanning topics such as the war in Ukraine, migration, climate change, COVID-19 / vaccines, gender & LGBT+, anti-establishment / anti-EU / anti-NATO framings, etc.) are present.

The model was developed at the Polish-Japanese Academy of Information Technology (PJAIT / PJATK).

Architecture: RobertaNarrativeModel — a roberta-large encoder + a single linear classification head (narrative_head, 1024 → 41) applied to the <s> (CLS) token.
Task: multi-label classification (one input can carry several narratives at once).
Base model: FacebookAI/roberta-large
Parameters: ~0.4B · Precision: FP32 · Format: safetensors
Language: English

Note on the architecture. This repository uses a custom model class (RobertaNarrativeModel) whose weights are stored under the transformer.* and narrative_head.* prefixes. It therefore does not load directly with AutoModelForSequenceClassification. Use the self-contained loading code in the How to use section below.

Labels

The model outputs 41 labels. The full mapping is in narrative_labels.json / label_config.json.

Show all 41 narratives

ID	Narrative
0	Abortion is evil/immoral/dangerous
1	Alternative treatments are more effective than conventional ones
2	Climate change is a hoax
3	Collapse of Western civilization is imminent
4	Conflict is a staged event prepared by outside forces
5	Contraception is against nature/dangerous/immoral
6	Conventional medicine is ineffective and corrupt
7	Conventional medicine is wrong about the causes of diseases
8	Elites manipulate elections
9	Elites want to take over the world
10	European Union is authoritarian
11	Feminism is a tool to destroy the natural order and traditional values
12	Global elites deliberately cause pandemics and diseases
13	Global warming does not exist/is not a serious threat
14	Governments fail to take proper action on migration crisis
15	Homosexuals are a threat
16	Humanity is not responsible for global warming
17	LGBT+ is a tool to destroy the natural order and traditional values
18	LGBT+ people are mentally ill
19	LGBT+ people are privileged
20	Media deliberately spreads lies
21	Migrants are a burden on the economy
22	Migrants are dangerous
23	Migrants are destroying local culture and breaking up local communities
24	Migration is a conspiracy of global elites
25	Most European countries are puppets of the West
26	NATO is authoritarian/warmongering
27	Official information is a tool to deceive citizens
28	Other
29	Russia is strong and winning the war
30	Sex education is a threat to children
31	Solutions to reduce human impact on environment and climate are a conspiracy
32	State and international institutions only serve to oppress citizens.
33	The West and their allies are immoral/hostile/ineffective
34	The energy crisis is artificially created
35	Transgender people are a threat
36	Ukraine is an evil, aggressive and dangerous country
37	Ukrainian refugees are a danger/burden
38	Vaccines are dangerous/ineffective/immoral
39	Western elites want to destroy the natural order and traditional values
40	other

How to use

import json
import torch
from torch import nn
from transformers import AutoTokenizer, AutoConfig, RobertaModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

REPO_ID = "pjait/narrative_classifier"


class RobertaNarrativeModel(nn.Module):
    """roberta-large encoder + a linear head over the <s> (CLS) token."""

    def __init__(self, config, num_labels):
        super().__init__()
        self.transformer = RobertaModel(config, add_pooling_layer=False)
        self.narrative_head = nn.Linear(config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None):
        out = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        cls = out.last_hidden_state[:, 0]          # <s> token representation
        return self.narrative_head(cls)            # raw logits (multi-label)


# --- load config, labels and weights ---------------------------------------
config = AutoConfig.from_pretrained(REPO_ID)
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

with open(hf_hub_download(REPO_ID, "narrative_labels.json")) as f:
    labels = json.load(f)
id2narrative = {int(k): v for k, v in labels["id2narrative"].items()}
num_labels = labels["num_labels"]

model = RobertaNarrativeModel(config, num_labels)
state_dict = load_file(hf_hub_download(REPO_ID, "model.safetensors"))
model.load_state_dict(state_dict)
model.eval()

# --- inference --------------------------------------------------------------
text = "The vaccines were rushed and are far more dangerous than the virus itself."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)
    probs = torch.sigmoid(logits)[0]              # multi-label -> sigmoid

THRESHOLD = 0.5
predicted = [(id2narrative[i], float(p)) for i, p in enumerate(probs) if p >= THRESHOLD]
print(sorted(predicted, key=lambda x: -x[1]))

THRESHOLD controls precision/recall trade-off; tune it on your own validation data.

Evaluation

Metrics from metrics.txt (evaluation split, epoch 3):

Metric	Value
Micro F1	0.494
Macro F1	0.185
Precision	0.700
Recall	0.382
Subset accuracy	0.787
Eval loss	0.023

The gap between micro and macro F1, together with high precision but lower recall, indicates the model is conservative and performs unevenly across narratives — likely better on well-represented narratives and weaker on rare ones. Treat predictions as a decision-support signal, not ground truth, and calibrate the threshold for your use case.

Intended use & limitations

Intended use. Research and analysis of disinformation/propaganda narratives in English-language media; content moderation triage; media-monitoring dashboards; academic studies of narrative spread.

Out of scope / cautions.

The model identifies whether text expresses or discusses a narrative; it does not establish truth, intent, or that the author endorses the narrative (quotation, debunking and reporting can trigger labels).
Trained on English; performance on other languages is not guaranteed.
Macro F1 is low — rare narratives are unreliable. Do not use for automated, consequential decisions about individuals without human review.
Sensitive topics (health, politics, gender, migration). Outputs can reflect biases in the training data. Human oversight is required for any deployment.

Training

Base model: FacebookAI/roberta-large fine-tuned for multi-label narrative classification.
Epochs: 3 (see training_args.bin for the full TrainingArguments).
Objective: multi-label classification (sigmoid + binary cross-entropy over 41 narratives).

Citation

If you use this model, please cite the Polish-Japanese Academy of Information Technology (PJAIT) and the author. (Add the relevant paper / BibTeX here.)

@misc{narrative_classifier_pjait,
  title  = {Narrative Classifier (RoBERTa-large, multi-label)},
  author = {Sosnowski, Witold},
  howpublished = {\url{https://huggingface.co/pjait/narrative_classifier}},
  note   = {Polish-Japanese Academy of Information Technology (PJAIT)},
  year   = {2025}
}

Downloads last month: 26

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for pjait/narrative_classifier

Base model

FacebookAI/roberta-large

Finetuned

(466)

this model