manifesto-project
/

manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1

PyTorch

Safetensors

xlm-roberta

custom_code

Model card Files Files and versions Community

denisealg commited on Sep 4

Commit

8d54fcf

•

1 Parent(s): 10e12d3

Update README.md

Browse files

Files changed (1) hide show

README.md +83 -3

README.md CHANGED Viewed

@@ -1,3 +1,83 @@
----
-license: bigscience-openrail-m
----

+---
+license: bigscience-openrail-m
+widget:
+- text: >-
+    We will restore funding to the Global Environment Facility and the
+    Intergovernmental Panel on Climate Change.
+---
+## Model description
+An xlm-roberta-large model fine-tuned on ~1,7 million annotated statements contained in the [Manifesto Corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
+The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
+It works for all languages the xlm-roberta model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages contained in the Manifesto Corpus:
+||||||
+|------|------|------|------|------|
+|armenian|bosnian|bulgarian|catalan|croatian|
+|czech|danish|dutch|english|estonian|
+|finnish|french|galician|georgian|german|
+|greek|hebrew|hungarian|icelandic|italian|
+|japanese|korean|latvian|lithuanian|macedonian|
+|montenegrin|norwegian|polish|portuguese|romanian|
+|russian|serbian|slovak|slovenian|spanish|
+|swedish|turkish|ukrainian| | |
+## How to use
+```python
+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1")
+tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
+sentence = "We will restore funding to the Global Environment Facility and the Intergovernmental Panel on Climate Change, to support critical climate science research around the world"
+inputs = tokenizer(sentence,
+                   return_tensors="pt",
+                   max_length=200,  #we limited the input to 200 tokens during finetuning
+                   padding="max_length",
+                   truncation=True
+                   )
+logits = model(**inputs).logits
+probabilities = torch.softmax(logits, dim=1).tolist()[0]
+probabilities = {model.config.id2label[index]: round(probability * 100, 2) for index, probability in enumerate(probabilities)}
+probabilities = dict(sorted(probabilities.items(), key=lambda item: item[1], reverse=True))
+print(probabilities)
+# {'501 - Environmental Protection: Positive': 67.28, '411 - Technology and Infrastructure': 15.19, '107 - Internationalism: Positive': 13.63, '416 - Anti-Growth Economy: Positive': 2.02...
+predicted_class = model.config.id2label[logits.argmax().item()]
+print(predicted_class)
+# 501 - Environmental Protection: Positive
+```
+## Model Performance
+The model was evaluated on a test set of 200,920 annotated manifesto statements.
+### Overall
+|                                                                                                       | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy |
+|-------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:|
+[Sentence Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1)|   0.57   |   0.73   |	  0.81   |	  0.48  |  0.43  |	 0.45   | 0.55|	     1.47      |
+[Context Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-context-2024-1-1)  |   0.64   |   0.81   |   0.88   |    0.55  |  0.52  |   0.53   | 0.63|      1.15     |
+### Citation
+Please cite the model as follows:
+Burst, Tobias / Lehmann, Pola / Franzmann, Simon / Al-Gaddooa, Denise / Ivanusch, Christoph / Regel, Sven / Riethmüller, Felicia / Weßels, Bernhard / Zehnter, Lisa (2024): manifestoberta. Version 56topics.sentence.2023.1.1. Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB) / Göttingen: Institut für Demokratieforschung (IfDem). https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1
+```bib
+@misc{Burst:2024,
+  Address = {Berlin / Göttingen},
+  Author = {Burst, Tobias AND Lehmann, Pola AND Franzmann, Simon AND Al-Gaddooa, Denise AND Ivanusch, Christoph AND Regel, Sven AND Riethmüller, Felicia AND Weßels, Bernhard AND Zehnter, Lisa},
+  Publisher = {Wissenschaftszentrum Berlin für Sozialforschung / Göttinger Institut für Demokratieforschung},
+  Title = {manifestoberta. Version 56topics.sentence.2024.1.1},
+  doi = {10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1},
+  url = {https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1},
+  Year = {2024},
+```