Model Card for Model ID

This model is fine-tuned for topic classification and uses the labels provided by the Comparative Agendas project. It can be used for the downstream task of classyfing press releases from political parties into 23 policy areas. It is similar to partypress/partypress-multilingual, however, its base model is FacebookAI/xlm-roberta-large and it was fine-tuned on more data.

Model Details

Model Description

This model is based on FacebookAI/xlm-roberta-large and was trained in a two-step process. In the first step a dataset of press releases was weakly labeled with GPT-4o and the model was trained on the data. In a second step, it was trained on the same human annotated dataset as partypress/partypress-multilingual. The weak pre-training led to improved results (see below).

Bias, Risks, and Limitations

[More Information Needed]

How to Get Started with the Model

>>> from transformers import pipeline
>>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
>>> partypress = pipeline("text-classification", model = "Sami92/XLM-R-Large-PartyPress", tokenizer = "Sami92/XLM-R-Large-PartyPress", **tokenizer_kwargs)
>>> partypress(["We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.", 
            "We urge all parties to end the violence and come to the table. This conflict between the two countries must end.",
            "Así, “el trabajo de los militares españoles está al servicio de España y de los demás países”, que participan en esta misión por mandato de la OTAN, ha recordado.",
            "Dass es immer noch einen Gender-Pay-Gap gibt, geht auf das Konto dieser Regierung."])

Training Details

Training Data

The model was trained on two datasets, each based on the data from partypress/partypress-multilingual. The first dataset was weakly labeled using GPT-4o. The prompt contained the label description taken from Erfort et al. (2023). The weakly labeled dataset contains 32,060 press releases. The second dataset is the human-annotated dataset that is used for training partypress/partypress-multilingual. For training only the single-coded examples were used (24,117). Evaluation was performed on the data that is annotated by two human coders per example (3,121).

Training Hyperparameters

Epochs: 10
Batch size: 16
learning_rate: 2e-5
weight_decay: 0.01
fp16: True

Evaluation

Accuracy	Precision	Recall	F1 score
0.72	0.72	0.72	0.72

The following figure below displays the performance and compares it to two benchmarks (scores as csv). The first benchmark is the coder agreement of the two coders per country (for details, see Erfort et al. (2023)). It is referred to as Coder F1 and the difference between the model performance and the coder agreement is referred to as Coder Difference. The model comes close to the agreement of human coders in almost all classes. One notable exception is Foreign Trade and to a lesser extent Defence and Law and Crime. The second benchmark are the results of partypress/partypress-multilingual, referred to as Party Press F1 and the difference to the present model is referred to as Party Press Difference. Except for Foreign Trade and Law and Crime, the present model is on par or stronger than the other Party Press Model. In total it achieves an F1 score that is .06 higher.

The figure below displays the confusion matrix of the individual classes on the test set.

Acknowledgements

I thank Cornelius Erfort for making the annotated press releases available.