Update README.md

b1a28a8 verified 3 months ago

4.67 kB

	---
	license: bigscience-openrail-m
	widget:
	- text: >-
	We will restore funding to the Global Environment Facility and the
	Intergovernmental Panel on Climate Change.
	---

	## Model description
	An xlm-roberta-large model fine-tuned on ~1,7 million annotated statements contained in the [Manifesto Corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2024a).
	The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
	It works for all languages the xlm-roberta model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages contained in the Manifesto Corpus:

	\|\|\|\|\|\|
	\|------\|------\|------\|------\|------\|
	\|armenian\|bosnian\|bulgarian\|catalan\|croatian\|
	\|czech\|danish\|dutch\|english\|estonian\|
	\|finnish\|french\|galician\|georgian\|german\|
	\|greek\|hebrew\|hungarian\|icelandic\|italian\|
	\|japanese\|korean\|latvian\|lithuanian\|macedonian\|
	\|montenegrin\|norwegian\|polish\|portuguese\|romanian\|
	\|russian\|serbian\|slovak\|slovenian\|spanish\|
	\|swedish\|turkish\|ukrainian\| \| \|

	## How to use

	```python
	import torch
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1")
	tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")

	sentence = "We will restore funding to the Global Environment Facility and the Intergovernmental Panel on Climate Change, to support critical climate science research around the world"

	inputs = tokenizer(sentence,
	return_tensors="pt",
	max_length=200, #we limited the input to 200 tokens during finetuning
	padding="max_length",
	truncation=True
	)

	logits = model(**inputs).logits

	probabilities = torch.softmax(logits, dim=1).tolist()[0]
	probabilities = {model.config.id2label[index]: round(probability * 100, 2) for index, probability in enumerate(probabilities)}
	probabilities = dict(sorted(probabilities.items(), key=lambda item: item[1], reverse=True))
	print(probabilities)
	# {'501 - Environmental Protection: Positive': 67.28, '411 - Technology and Infrastructure': 15.19, '107 - Internationalism: Positive': 13.63, '416 - Anti-Growth Economy: Positive': 2.02...

	predicted_class = model.config.id2label[logits.argmax().item()]
	print(predicted_class)
	# 501 - Environmental Protection: Positive
	```


	## Model Performance

	The model was evaluated on a test set of 200,920 annotated manifesto statements.

	### Overall

	\| \| Accuracy \| Top2_Acc \| Top3_Acc \| Precision\| Recall \| F1_Macro \| MCC \| Cross-Entropy \|
	\|-------------------------------------------------------------------------------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:------:\|:--------:\|:---:\|:-------------:\|
	[Sentence Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1)\| 0.57 \| 0.73 \| 0.81 \| 0.48 \| 0.43 \| 0.45 \| 0.55\| 1.47 \|
	[Context Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-context-2024-1-1) \| 0.64 \| 0.81 \| 0.88 \| 0.55 \| 0.52 \| 0.53 \| 0.63\| 1.15 \|

	### Citation

	Please cite the model as follows:

	Burst, Tobias / Lehmann, Pola / Franzmann, Simon / Al-Gaddooa, Denise / Ivanusch, Christoph / Regel, Sven / Riethmüller, Felicia / Weßels, Bernhard / Zehnter, Lisa (2024): manifestoberta. Version 56topics.sentence.2024.1.1. Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB) / Göttingen: Institut für Demokratieforschung (IfDem). https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1

	```bib
	@misc{Burst:2024,
	Address = {Berlin / Göttingen},
	Author = {Burst, Tobias AND Lehmann, Pola AND Franzmann, Simon AND Al-Gaddooa, Denise AND Ivanusch, Christoph AND Regel, Sven AND Riethmüller, Felicia AND Weßels, Bernhard AND Zehnter, Lisa},
	Publisher = {Wissenschaftszentrum Berlin für Sozialforschung / Göttinger Institut für Demokratieforschung},
	Title = {manifestoberta. Version 56topics.sentence.2024.1.1},
	doi = {10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1},
	url = {https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1},
	Year = {2024},
	```