Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ license: mit
|
|
3 |
---
|
4 |
## Model description
|
5 |
|
6 |
-
An xlm-roberta-large model fine-tuned on
|
7 |
The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
|
8 |
|
9 |
The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences.
|
@@ -50,4 +50,75 @@ print(predicted_class)
|
|
50 |
# 201 - Freedom and Human Rights
|
51 |
```
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
|
|
3 |
---
|
4 |
## Model description
|
5 |
|
6 |
+
An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
|
7 |
The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
|
8 |
|
9 |
The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences.
|
|
|
50 |
# 201 - Freedom and Human Rights
|
51 |
```
|
52 |
|
53 |
+
## Model Performance
|
54 |
+
|
55 |
+
The model was evaluated on a test set of 186,276 annotated manifesto statements (10% of the whole corpus).
|
56 |
+
|
57 |
+
### Overall
|
58 |
+
|
59 |
+
|
60 |
+
| Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
|
61 |
+
|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
|
62 |
+
| 0.64 | 0.81 | 0.88 | 0.54 | 0.52 | 0.53 | 0.62| 1.15 |
|
63 |
+
|
64 |
+
### Categories
|
65 |
+
|
66 |
+
|Category|Precision|Recall|F1|n_test(%)|n_predicted(%)|
|
67 |
+
|:------|:-----------:|:----:|:----:|:-----:|:-----:|
|
68 |
+
| 101 |0.50|0.48|0.49|0.30%|0.29%|
|
69 |
+
|102|0.56|0.61|0.58|0.09%|0.10%|
|
70 |
+
|103|0.51|0.36|0.42|0.28%|0.20%|
|
71 |
+
|104|0.78|0.81|0.79|1.57%|1.64%|
|
72 |
+
|105|0.69|0.70|0.69|0.34%|0.34%|
|
73 |
+
|106|0.59|0.57|0.58|0.33%|0.32%|
|
74 |
+
|107|0.68|0.66|0.67|2.24%|2.17%|
|
75 |
+
|108|0.66|0.68|0.67|1.20%|1.24%|
|
76 |
+
|109|0.52|0.39|0.45|0.17%|0.13%|
|
77 |
+
|110|0.63|0.68|0.65|0.36%|0.38%|
|
78 |
+
|201|0.58|0.59|0.59|2.16%|2.20%|
|
79 |
+
|202|0.62|0.63|0.62|3.25%|3.28%|
|
80 |
+
|203|0.46|0.47|0.47|0.19%|0.19%|
|
81 |
+
|204|0.61|0.37|0.46|0.25%|0.15%|
|
82 |
+
|301|0.66|0.71|0.68|2.13%|2.29%|
|
83 |
+
|302|0.38|0.25|0.30|0.17%|0.11%|
|
84 |
+
|303|0.58|0.60|0.59|5.12%|5.31%|
|
85 |
+
|304|0.67|0.65|0.66|1.38%|1.34%|
|
86 |
+
|305|0.59|0.57|0.58|2.32%|2.22%|
|
87 |
+
|401|0.45|0.36|0.40|1.50%|1.21%|
|
88 |
+
|402|0.61|0.58|0.59|2.73%|2.60%|
|
89 |
+
|403|0.56|0.51|0.53|3.59%|3.25%|
|
90 |
+
|404|0.30|0.15|0.20|0.58%|0.28%|
|
91 |
+
|405|0.43|0.51|0.47|0.18%|0.21%|
|
92 |
+
|406|0.38|0.46|0.42|0.26%|0.31%|
|
93 |
+
|407|0.56|0.52|0.54|0.40%|0.38%|
|
94 |
+
|408|0.28|0.17|0.21|1.34%|0.79%|
|
95 |
+
|409|0.37|0.21|0.27|0.24%|0.14%|
|
96 |
+
|410|0.53|0.50|0.52|2.22%|2.08%|
|
97 |
+
|411|0.73|0.75|0.74|8.32%|8.53%|
|
98 |
+
|412|0.26|0.20|0.22|0.58%|0.45%|
|
99 |
+
|413|0.49|0.63|0.55|0.29%|0.37%|
|
100 |
+
|414|0.58|0.55|0.56|1.38%|1.32%|
|
101 |
+
|415|0.14|0.23|0.18|0.05%|0.07%|
|
102 |
+
|416|0.52|0.49|0.50|2.45%|2.35%|
|
103 |
+
|501|0.69|0.78|0.73|4.77%|5.35%|
|
104 |
+
|502|0.78|0.84|0.81|3.08%|3.32%|
|
105 |
+
|503|0.61|0.63|0.62|5.96%|6.11%|
|
106 |
+
|504|0.71|0.76|0.74|10.05%|10.76%|
|
107 |
+
|505|0.46|0.37|0.41|0.69%|0.55%|
|
108 |
+
|506|0.78|0.82|0.80|5.42%|5.72%|
|
109 |
+
|507|0.45|0.26|0.33|0.14%|0.08%|
|
110 |
+
|601|0.52|0.46|0.49|1.79%|1.57%|
|
111 |
+
|602|0.35|0.34|0.34|0.24%|0.24%|
|
112 |
+
|603|0.65|0.68|0.67|1.36%|1.42%|
|
113 |
+
|604|0.62|0.48|0.54|0.57%|0.44%|
|
114 |
+
|605|0.72|0.74|0.73|4.22%|4.33%|
|
115 |
+
|606|0.56|0.48|0.51|1.45%|1.23%|
|
116 |
+
|607|0.57|0.67|0.62|1.08%|1.25%|
|
117 |
+
|608|0.48|0.48|0.48|0.41%|0.41%|
|
118 |
+
|701|0.62|0.66|0.64|3.35%|3.59%|
|
119 |
+
|702|0.42|0.30|0.35|0.08%|0.06%|
|
120 |
+
|703|0.75|0.87|0.80|2.65%|3.07%|
|
121 |
+
|704|0.43|0.32|0.37|0.57%|0.43%|
|
122 |
+
|705|0.38|0.33|0.35|0.80%|0.69%|
|
123 |
+
|706|0.43|0.37|0.39|1.35%|1.16%|
|
124 |
|