manifesto-project
/

manifestoberta-xlm-roberta-56policy-topics-context-2023-1-1

Text Classification

Transformers

PyTorch

xlm-roberta

custom_code

Model card Files Files and versions Community

tburst commited on Sep 28, 2023

Commit

0bcd9d1

•

1 Parent(s): 39ba715

Update README.md

Browse files

Files changed (1) hide show

README.md +18 -5

README.md CHANGED Viewed

@@ -5,6 +5,19 @@ license: mit
 An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
 The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
 The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
@@ -75,14 +88,14 @@ Both variants performed similarly to our sentence pair approach, but lead to hig
 ## Model Performance
-The model was evaluated on a test set of 186,276 annotated manifesto statements.
 ### Overall
-| Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
-|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
-|   0.64   |   0.81   |   0.88   |    0.54  |  0.52  |   0.53   | 0.62|      1.15     |
 ### Categories

 An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
 The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
+It works for all languages the xlm-roberta-model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages of the Manifesto Corpus on which it was fine-tuned:
+|Language|Language|Language|Language|Language|
+|------|------|------|------|------|
+|armenian|bosnian|bulgarian|catalan|croatian|
+|czech|danish|dutch|english|estonian|
+|finnish|french|galician|georgian|german|
+|greek|hebrew|hungarian|icelandic|italian|
+|japanese|korean|latvian|lithuanian|macedonian|
+|montenegrin|norwegian|polish|portuguese|romanian|
+|russian|serbian|slovak|slovenian|spanish|
+|swedish|turkish|ukrainian| | |
 The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
 ## Model Performance
+The model was evaluated on a test set of 199,046 annotated manifesto statements.
 ### Overall
+|                                                                                                       | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
+|-------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
+[Sentence Model](https://huggingface.co/manifesto-project/xlm-roberta-political-56topics-sentence-2023a)|   0.57   |   0.73   |	  0.81   |	  0.49  |  0.43  |	 0.45   | 0.55|	     1.5      |
+[Context Model](https://huggingface.co/manifesto-project/xlm-roberta-political-56topics-sentence-2023a) |   0.64   |   0.81   |   0.88   |    0.54  |  0.52  |   0.53   | 0.62|      1.15     |
 ### Categories