Model Card for Model hybrinfox/ukraine-operation_propaganda-detection-EN
This model aims at identifying propaganda on the topic of the Ukrainian invasion in press articles.
Model Details
Model Description
The model is a fine-tuned version of roberta-base (https://huggingface.co/roberta-base) on the Propagandist Pseudo-News dataset (https://github.com/hybrinfox/ppn)
- Owned by: Airbus Defence and Space
- Developed for: HYBRINFOX consortium (Airbus Defence and Space - Paris Sciences et Lettres, Ecole Normale Supérieure Ulm, Institut Jean-Nicod - Université de Rennes, Inria, IRISA, Mondeca)
- Funded by : French National Research Agency (ANR-21-ASIA-0003)
- Model type: Text classification
- Language(s) (NLP): fr
- License: CC BY-NB 4.0
- Finetuned from model : roberta-base
Uses
Direct Use
The model can be used directly to classify press articles written in English about the Ukraine invasion or related topic. The output corresponds to the probability of belonging to each class, 0 for regular press articles and 1 for propagandist article.
Out-of-Scope Use
This model should not be used to categorize news sources as propagandist or not, but can help identify pro-Russian narratives and Russian values. This model is not trained to identify the auhors' intentions and should not be used to make such conclusions.
Bias, Risks, and Limitations
This model has been trained with articles from different sources, but all articles from the propaganda class share the same narrative. Moreover, all articles shared the same topic of the Russio-Ukrainian conflict. The model is not infaillible and shouldn't be use to make critical decisions when judging an article, its authors, or the corresponding news outlet.
Recommendations
We recommend that you use this model for research purposes and to always cross its predictions with the informed opinion of other sources before taking any conclusion.
How to Get Started with the Model
Use the code below to get started with the model.
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="hybrinfox/ukraine-operation_propaganda-detection-EN")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("hybrinfox/ukraine-operation_propaganda-detection-FR")
model = AutoModelForSequenceClassification.from_pretrained("hybrinfox/ukraine-operation_propaganda-detection-EN")
Training Details
Training Data
The model has been trained using the data from the Propagandist Pseudo-News dataset available at https://github.com/hybrinfox/ppn for the positive class. Additional articles on the same topic, but from mainstream sources has been used for the negative class. Please, read the paper for more details.
Training Procedure
Training Hyperparameters
- Training regime:
Batch size: 8 Learning rate: 5e-5 Number of fine-tuning epochs: 3 Optimizer: Adam with default settings Loss function: Binary Cross-Entropy
Evaluation
The model was evaluated during training with the training metrics, as well as with the validation loss
Testing Data, Factors & Metrics
Testing Data
The previously described dataset has been split between train/val/test with a 80/10/10 ratio. The reported results are on the test set, after using the training set for training and validation for controling the model learning.
Metrics
The reported metrics are the F1 scores and losses on the three sets.
Results
Split | Loss | F1 score |
---|---|---|
Train | 0.0004 | 1.0000 |
Val | 0.0170 | 0.9985 |
Test | 0.0329 | 0.9970 |
Summary
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: T4
- Hours used: 0.3
- Cloud Provider: GCP
- Compute Region: europe-west1
- Carbon Emitted: 0.01 kg.CO2 eq
Thanks to fine-tuning a general foundation model, the environmental impact of training our propaganda detector is negligible, being the equivalent of 40 meters traveled by an internal combustion engine car. The low-carbon energy used in the compute region also helped to reduce the environmental impact of the training.
Citation
Géraud Faye, Benjamin Icard, Morgane Casanova, Julien Chanson, François Maine, François Bancilhon, Guillaume Gadek, Guillaume Gravier, and Paul Égré. 2024. Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification. In Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language, pages 62–72, Malta. Association for Computational Linguistics.
BibTeX:
@inproceedings{faye-etal-2024-exposing,
title = "Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification",
author = "Faye, G{\'e}raud and
Icard, Benjamin and
Casanova, Morgane and
Chanson, Julien and
Maine, Fran{\c{c}}ois and
Bancilhon, Fran{\c{c}}ois and
Gadek, Guillaume and
Gravier, Guillaume and
{\'E}gr{\'e}, Paul",
editor = "Pyatkin, Valentina and
Fried, Daniel and
Stengel-Eskin, Elias and
Stengel-Eskin, Elias and
Liu, Alisa and
Pezzelle, Sandro",
booktitle = "Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language",
month = mar,
year = "2024",
address = "Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.unimplicit-1.6",
pages = "62--72",
}
Model Card Authors
HYBRINFOX consortium
Model Card Contact
- Downloads last month
- 0