manifesto-project
/

manifestoberta-xlm-roberta-56policy-topics-context-2023-1-1

Text Classification

Model card Files Files and versions Community

tburst commited on Sep 26, 2023

Commit

b5068aa

•

1 Parent(s): bbce29e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -6,8 +6,8 @@ license: mit
 An xlm-roberta-large model fine-tuned on all ~1,8 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
 The model can be used to categorize any type of text into [56 different political categories](https://manifesto-project.wzb.eu/coding_schemes/mp_v4) according to the Manifesto Project's coding scheme (Handbook 4).
-The context model variant additionally utilizes the surrounding sentences of a statement to improve the classification results for ambiguous sentences.
-During fine-tuning we collected the surrounding sentences of a statement and combined them with the statement itself to provide the larger context of a sentence as the second part of a sentence pair input.
 We limited the statement itself to 100 tokens and the context of the statement to 200 tokens.
 **Important**

 An xlm-roberta-large model fine-tuned on all ~1,8 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
 The model can be used to categorize any type of text into [56 different political categories](https://manifesto-project.wzb.eu/coding_schemes/mp_v4) according to the Manifesto Project's coding scheme (Handbook 4).
+The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences.
+During fine-tuning we collected the surrounding sentences of a statement and merged them with the statement itself to provide the larger context of a sentence as the second part of a sentence pair input.
 We limited the statement itself to 100 tokens and the context of the statement to 200 tokens.
 **Important**