tburst commited on
Commit
0bcd9d1
1 Parent(s): 39ba715

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -5
README.md CHANGED
@@ -5,6 +5,19 @@ license: mit
5
 
6
  An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
7
  The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
10
 
@@ -75,14 +88,14 @@ Both variants performed similarly to our sentence pair approach, but lead to hig
75
 
76
  ## Model Performance
77
 
78
- The model was evaluated on a test set of 186,276 annotated manifesto statements.
79
 
80
  ### Overall
81
 
82
-
83
- | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
84
- |:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
85
- | 0.64 | 0.81 | 0.88 | 0.54 | 0.52 | 0.53 | 0.62| 1.15 |
86
 
87
  ### Categories
88
 
 
5
 
6
  An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
7
  The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
8
+ It works for all languages the xlm-roberta-model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages of the Manifesto Corpus on which it was fine-tuned:
9
+
10
+ |Language|Language|Language|Language|Language|
11
+ |------|------|------|------|------|
12
+ |armenian|bosnian|bulgarian|catalan|croatian|
13
+ |czech|danish|dutch|english|estonian|
14
+ |finnish|french|galician|georgian|german|
15
+ |greek|hebrew|hungarian|icelandic|italian|
16
+ |japanese|korean|latvian|lithuanian|macedonian|
17
+ |montenegrin|norwegian|polish|portuguese|romanian|
18
+ |russian|serbian|slovak|slovenian|spanish|
19
+ |swedish|turkish|ukrainian| | |
20
+
21
 
22
  The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
23
 
 
88
 
89
  ## Model Performance
90
 
91
+ The model was evaluated on a test set of 199,046 annotated manifesto statements.
92
 
93
  ### Overall
94
 
95
+ | | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
96
+ |-------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
97
+ [Sentence Model](https://huggingface.co/manifesto-project/xlm-roberta-political-56topics-sentence-2023a)| 0.57 | 0.73 | 0.81 | 0.49 | 0.43 | 0.45 | 0.55| 1.5 |
98
+ [Context Model](https://huggingface.co/manifesto-project/xlm-roberta-political-56topics-sentence-2023a) | 0.64 | 0.81 | 0.88 | 0.54 | 0.52 | 0.53 | 0.62| 1.15 |
99
 
100
  ### Categories
101