tburst commited on
Commit
77c985c
1 Parent(s): 4bfa20b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -6,9 +6,7 @@ license: mit
6
  An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
7
  The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
8
 
9
- The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences.
10
- During fine-tuning we collected the surrounding sentences of a statement and merged them with the statement itself to provide the larger context of a sentence as the second part of a sentence pair input.
11
- We limited the statement itself to 100 tokens and the context of the statement to 200 tokens.
12
 
13
  **Important**
14
 
@@ -50,7 +48,7 @@ print(predicted_class)
50
  # 201 - Freedom and Human Rights
51
  ```
52
 
53
- ## Training procedure
54
 
55
  Training of the model took place on all quasi-sentences of the Manifesto Corpus (version 2023a), minus 10% that were kept out of training for the final test and evaluation results.
56
  This results in a training dataset of 1,601,329 quasi-sentences.
 
6
  An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [manifesto corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a).
7
  The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)).
8
 
9
+ The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results for ambiguous sentences. (See Training Procedure for details)
 
 
10
 
11
  **Important**
12
 
 
48
  # 201 - Freedom and Human Rights
49
  ```
50
 
51
+ ## Training Procedure
52
 
53
  Training of the model took place on all quasi-sentences of the Manifesto Corpus (version 2023a), minus 10% that were kept out of training for the final test and evaluation results.
54
  This results in a training dataset of 1,601,329 quasi-sentences.