Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,19 @@ license: mit
|
|
5 |
|
6 |
An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
|
7 |
The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
|
10 |
|
@@ -75,14 +88,14 @@ Both variants performed similarly to our sentence pair approach, but lead to hig
|
|
75 |
|
76 |
## Model Performance
|
77 |
|
78 |
-
The model was evaluated on a test set of
|
79 |
|
80 |
### Overall
|
81 |
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
| 0.64 | 0.81 | 0.88 | 0.54 | 0.52 | 0.53 | 0.62| 1.15 |
|
86 |
|
87 |
### Categories
|
88 |
|
|
|
5 |
|
6 |
An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
|
7 |
The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
|
8 |
+
It works for all languages the xlm-roberta-model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages of the Manifesto Corpus on which it was fine-tuned:
|
9 |
+
|
10 |
+
|Language|Language|Language|Language|Language|
|
11 |
+
|------|------|------|------|------|
|
12 |
+
|armenian|bosnian|bulgarian|catalan|croatian|
|
13 |
+
|czech|danish|dutch|english|estonian|
|
14 |
+
|finnish|french|galician|georgian|german|
|
15 |
+
|greek|hebrew|hungarian|icelandic|italian|
|
16 |
+
|japanese|korean|latvian|lithuanian|macedonian|
|
17 |
+
|montenegrin|norwegian|polish|portuguese|romanian|
|
18 |
+
|russian|serbian|slovak|slovenian|spanish|
|
19 |
+
|swedish|turkish|ukrainian| | |
|
20 |
+
|
21 |
|
22 |
The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
|
23 |
|
|
|
88 |
|
89 |
## Model Performance
|
90 |
|
91 |
+
The model was evaluated on a test set of 199,046 annotated manifesto statements.
|
92 |
|
93 |
### Overall
|
94 |
|
95 |
+
| | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
|
96 |
+
|-------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
|
97 |
+
[Sentence Model](https://huggingface.co/manifesto-project/xlm-roberta-political-56topics-sentence-2023a)| 0.57 | 0.73 | 0.81 | 0.49 | 0.43 | 0.45 | 0.55| 1.5 |
|
98 |
+
[Context Model](https://huggingface.co/manifesto-project/xlm-roberta-political-56topics-sentence-2023a) | 0.64 | 0.81 | 0.88 | 0.54 | 0.52 | 0.53 | 0.62| 1.15 |
|
99 |
|
100 |
### Categories
|
101 |
|