classla
/

xlm-r-parla

Fill-Mask

Transformers

PyTorch

xlm-roberta

parliament

Model card Files Files and versions Community

nljubesi commited on Sep 20, 2023

Commit

ab931a0

•

1 Parent(s): 7859bb9

Update README.md

Browse files

Files changed (1) hide show

README.md +20 -7

README.md CHANGED Viewed

@@ -40,14 +40,27 @@ inference: false
 # Multilingual parliamentary model XLM-R-parla
-This is the [XLM-R-large model](https://huggingface.co/xlm-roberta-large) additionally pre-trained on texts of parliamentary proceedings. Texts for the additional pre-training come from the [ParlaMint corpus](http://hdl.handle.net/11356/1486) and the [EuroParl corpus](https://www.statmt.org/europarl/).
-The model is a result of the [ParlaMint project](https://www.clarin.eu/parlamint). The details on the model development are described in the following publication (to be published soon):
-The first application of this model is the [XLM-R-parlasent model](https://huggingface.co/classla/xlm-r-parlasent), fine-tuned on the [ParlaSent dataset](http://hdl.handle.net/11356/1868) for the task of sentiment analysis in parliamentary proceedings.
-Find more detail about this model in our [paper](https://arxiv.org/abs/2309.09783):
 ```latex
- @article{Mochtak_Rupnik_Ljubešić_2023, title={The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings}, rights={All rights reserved}, url={http://arxiv.org/abs/2309.09783}, abstractNote={Sentiments inherently drive politics. How we receive and process information plays an essential role in political decision-making, shaping our judgment with strategic consequences both on the level of legislators and the masses. If sentiment plays such an important role in politics, how can we study and measure it systematically? The paper presents a new dataset of sentiment-annotated sentences, which are used in a series of experiments focused on training a robust sentiment classiﬁer for parliamentary proceedings. The paper also introduces the ﬁrst domain-speciﬁc LLM for political science applications additionally pre-trained on 1.72 billion domain-speciﬁc words from proceedings of 27 European parliaments. We present experiments demonstrating how the additional pre-training of LLM on parliamentary data can signiﬁcantly improve the model downstream performance on the domain-speciﬁc tasks, in our case, sentiment detection in parliamentary proceedings. We further show that multilingual models perform very well on unseen languages and that additional data from other languages signiﬁcantly improves the target parliament’s results. The paper makes an important contribution to multiple domains of social sciences and bridges them with computer science and computational linguistics. Lastly, it sets up a more robust approach to sentiment analysis of political texts in general, which allows scholars to study political sentiment from a comparative perspective using standardized tools and techniques.}, note={arXiv:2309.09783 [cs]}, number={arXiv:2309.09783}, publisher={arXiv}, author={Mochtak, Michal and Rupnik, Peter and Ljubešić, Nikola}, year={2023}, month={Sep}, language={en} }
-```

 # Multilingual parliamentary model XLM-R-parla
+This is the [XLM-R-large model](https://huggingface.co/xlm-roberta-large) additionally pre-trained on texts of parliamentary proceedings. Texts for the additional pre-training, 1.7 billion words in size, come from the [ParlaMint corpus](http://hdl.handle.net/11356/1486) and the [EuroParl corpus](https://www.statmt.org/europarl/).
+The model is a result of the [ParlaMint project](https://www.clarin.eu/parlamint). The details on the model development are described in the following [paper](https://arxiv.org/abs/2309.09783):
 ```latex
+@article{
+ Mochtak_Rupnik_Ljubešić_2023,
+ title={The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings},
+ rights={All rights reserved},
+ url={http://arxiv.org/abs/2309.09783},
+ abstractNote={Sentiments inherently drive politics. How we receive and process information plays an essential role in political decision-making, shaping our judgment with strategic consequences both on the level of legislators and the masses. If sentiment plays such an important role in politics, how can we study and measure it systematically? The paper presents a new dataset of sentiment-annotated sentences, which are used in a series of experiments focused on training a robust sentiment classiﬁer for parliamentary proceedings. The paper also introduces the ﬁrst domain-speciﬁc LLM for political science applications additionally pre-trained on 1.72 billion domain-speciﬁc words from proceedings of 27 European parliaments. We present experiments demonstrating how the additional pre-training of LLM on parliamentary data can signiﬁcantly improve the model downstream performance on the domain-speciﬁc tasks, in our case, sentiment detection in parliamentary proceedings. We further show that multilingual models perform very well on unseen languages and that additional data from other languages signiﬁcantly improves the target parliament’s results. The paper makes an important contribution to multiple domains of social sciences and bridges them with computer science and computational linguistics. Lastly, it sets up a more robust approach to sentiment analysis of political texts in general, which allows scholars to study political sentiment from a comparative perspective using standardized tools and techniques.},
+ note={arXiv:2309.09783 [cs]},
+ number={arXiv:2309.09783},
+ publisher={arXiv},
+ author={Mochtak, Michal and Rupnik, Peter and Ljubešić, Nikola},
+ year={2023},
+ month={Sep},
+ language={en}
+}
+```
+The first application of this model is the [XLM-R-parlasent model](https://huggingface.co/classla/xlm-r-parlasent), fine-tuned on the [ParlaSent dataset](http://hdl.handle.net/11356/1868) for the task of sentiment analysis in parliamentary proceedings.