Added links to the paper and the training dataset
Browse files
README.md
CHANGED
|
@@ -111,14 +111,15 @@ to the [CAP (Comparative Agendas Project) schema](https://www.comparativeagendas
|
|
| 111 |
|
| 112 |
This classification model is based on the multilingual parliamentary [XLM-R-Parla](https://huggingface.co/classla/xlm-r-parla) BERT-like model,
|
| 113 |
which is a XLM-RoBERTa-large model that was additionally pre-trained on texts of parliamentary proceedings.
|
| 114 |
-
To develop the ParlaCAP model, XLM-R-Parla was additionally fine-tuned on 29,779 instances (speeches) from
|
| 115 |
29 [ParlaMint 4.1](http://hdl.handle.net/11356/1912) datasets
|
| 116 |
containing transcriptions of parliamentary debates of 29 European countries and autonomous regions.
|
| 117 |
The speeches were automatically annotated with 22 CAP labels (21 major topics and a label "Other") using the GPT-4o model
|
| 118 |
in a zero-shot prompting fashion
|
| 119 |
following the [LLM teacher-student framework](https://ieeexplore.ieee.org/document/10900365).
|
| 120 |
Evaluation of the GPT model has shown that its annotation performance is
|
| 121 |
-
comparable to those of human annotators.
|
|
|
|
| 122 |
|
| 123 |
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
| 124 |
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
|
|
@@ -186,7 +187,23 @@ To apply the model to the text corpora in the ParlaMint TXT format
|
|
| 186 |
|
| 187 |
## How to Cite
|
| 188 |
|
| 189 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
```
|
| 191 |
@misc{parlacap_model,
|
| 192 |
author = {Kuzman Punger{\v s}ek, Taja and Ljube{\v s}i{\'c}, Nikola},
|
|
|
|
| 111 |
|
| 112 |
This classification model is based on the multilingual parliamentary [XLM-R-Parla](https://huggingface.co/classla/xlm-r-parla) BERT-like model,
|
| 113 |
which is a XLM-RoBERTa-large model that was additionally pre-trained on texts of parliamentary proceedings.
|
| 114 |
+
To develop the ParlaCAP model, XLM-R-Parla was additionally fine-tuned on the [ParlaCAP-train dataset](http://hdl.handle.net/11356/2093): 29,779 instances (speeches) from
|
| 115 |
29 [ParlaMint 4.1](http://hdl.handle.net/11356/1912) datasets
|
| 116 |
containing transcriptions of parliamentary debates of 29 European countries and autonomous regions.
|
| 117 |
The speeches were automatically annotated with 22 CAP labels (21 major topics and a label "Other") using the GPT-4o model
|
| 118 |
in a zero-shot prompting fashion
|
| 119 |
following the [LLM teacher-student framework](https://ieeexplore.ieee.org/document/10900365).
|
| 120 |
Evaluation of the GPT model has shown that its annotation performance is
|
| 121 |
+
comparable to those of human annotators. For more information, see the paper ["Supercharging Agenda Setting Research:
|
| 122 |
+
The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification"](https://doi.org/10.48550/arXiv.2602.16516) (Kuzman Pungeršek et al., 2026).
|
| 123 |
|
| 124 |
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
| 125 |
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
|
|
|
|
| 187 |
|
| 188 |
## How to Cite
|
| 189 |
|
| 190 |
+
Please cite the paper presenting the model:
|
| 191 |
+
|
| 192 |
+
```
|
| 193 |
+
@article{pungersek2026parlacap-paper,
|
| 194 |
+
title={{Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification}},
|
| 195 |
+
author={Kuzman Punger{\v s}ek, Taja and Rupnik, Peter and {\v S}irini{\'c}, Daniela and Ljube{\v s}i{\'c}, Nikola},
|
| 196 |
+
year={2026},
|
| 197 |
+
eprint={2602.16516},
|
| 198 |
+
archivePrefix={arXiv},
|
| 199 |
+
primaryClass={cs.CL},
|
| 200 |
+
url={https://arxiv.org/abs/2602.16516},
|
| 201 |
+
journal={arXiv preprint arXiv:2602.16516},
|
| 202 |
+
}
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
You can also cite the model as follows:
|
| 206 |
+
|
| 207 |
```
|
| 208 |
@misc{parlacap_model,
|
| 209 |
author = {Kuzman Punger{\v s}ek, Taja and Ljube{\v s}i{\'c}, Nikola},
|