An xlm-roberta-large
model finetuned on a multilingual training data containing major topic codes from the Comparative Agendas Project.
Training data:
- US Presidential occasional remarks (English, sentences)
- NYTimes lead paragraphs (English, full documents)
- Spanish parliamentary speeches (Spanish, sentences)
- Hungarian bills and laws (Hungarian, full documents)
- Polish laws (Polish, full documents)
- Danish parliamentary questions (Danish, sentences)
- Flemish Facebook comments (sentences), newspaper articles
Model performance
Model accuracy is 86%
label | precision | recall | f1-score | support |
---|---|---|---|---|
0 | 0,82 | 0,80 | 0,81 | 1245 |
1 | 0,82 | 0,71 | 0,76 | 642 |
2 | 0,90 | 0,92 | 0,91 | 1229 |
3 | 0,88 | 0,88 | 0,88 | 710 |
4 | 0,83 | 0,81 | 0,82 | 906 |
5 | 0,91 | 0,92 | 0,92 | 1059 |
6 | 0,85 | 0,85 | 0,85 | 771 |
7 | 0,86 | 0,91 | 0,88 | 474 |
8 | 0,86 | 0,88 | 0,87 | 676 |
9 | 0,89 | 0,93 | 0,91 | 1350 |
10 | 0,87 | 0,87 | 0,87 | 1726 |
11 | 0,81 | 0,81 | 0,81 | 761 |
12 | 0,82 | 0,82 | 0,82 | 424 |
13 | 0,80 | 0,80 | 0,80 | 847 |
14 | 0,84 | 0,88 | 0,86 | 1088 |
15 | 0,83 | 0,87 | 0,85 | 535 |
16 | 0,74 | 0,72 | 0,73 | 218 |
17 | 0,84 | 0,88 | 0,86 | 2351 |
18 | 0,84 | 0,82 | 0,83 | 2013 |
19 | 0,86 | 0,82 | 0,84 | 414 |
20 | 0,93 | 0,92 | 0,93 | 3293 |
21 | 0,64 | 0,58 | 0,61 | 663 |
22 | 1,00 | 0,08 | 0,14 | 13 |
23 | 1,00 | 0,15 | 0,27 | 13 |
24 | 0,83 | 0,83 | 0,83 | 35 |
25 | 0,00 | 0,00 | 0,00 | 1 |
26 | 0,00 | 0,00 | 0,00 | 8 |
macro avg | 0,79 | 0,72 | 0,73 | 23465 |
weighted avg | 0,86 | 0,86 | 0,86 | 23465 |
Inference API (serverless) is not available, repository is disabled.