manifesto-project
/

manifestoberta-xlm-roberta-56policy-topics-context-2023-1-1

Text Classification

Model card Files Files and versions Community

tburst commited on Sep 26, 2023

Commit

a925f1d

•

1 Parent(s): 529a161

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -64,7 +64,11 @@ As training parameters, we used the following settings: learning rate: 1e-5, wei
 To adapt the model to the task of classifying statements in manifestos we made some modifications to the traditional training setup.
 Given that human annotators in the Manifesto Project are encouraged to use surrounding sentences to interpret ambiguous statements , we combined statements  with their context for our model's input.
 Specifically, we used a sentence-pair input, where the single to-be-classified statement gets followed by the separator token followed by the whole bigger context of length 200 tokens, in which the statement to-be-classified is embedded.
-Here is an example: "`<s>` We must right the wrongs in our democracy, `</s>` To turn this crisis into a crucible, from which we will forge a stronger, brighter, and more equitable future. We must right the wrongs in our democracy, redress the systemic injustices that have long plagued our society,throw open the doors of opportunity for all Americans and reinvent our institutions at home and our leadership abroad. `</s>`".
 The second part, which contains the context, is greedily filled until it contains 200 tokens.
 Our tests showed that including the context helped to improve the performance of the classification model considerably (~8% accuracy).
 We tried other approaches like using two XLM-RoBERTa models as a duo, where one receives the sentence and one the context, and a shared-layer model, where both inputs are fed separately trough the same model.

 To adapt the model to the task of classifying statements in manifestos we made some modifications to the traditional training setup.
 Given that human annotators in the Manifesto Project are encouraged to use surrounding sentences to interpret ambiguous statements , we combined statements  with their context for our model's input.
 Specifically, we used a sentence-pair input, where the single to-be-classified statement gets followed by the separator token followed by the whole bigger context of length 200 tokens, in which the statement to-be-classified is embedded.
+Here is an example:
+*"`<s>` We must right the wrongs in our democracy, `</s>``</s>` To turn this crisis into a crucible, from which we will forge a stronger, brighter, and more equitable future. We must right the wrongs in our democracy, redress the systemic injustices that have long plagued our society,throw open the doors of opportunity for all Americans and reinvent our institutions at home and our leadership abroad. `</s>`".*
 The second part, which contains the context, is greedily filled until it contains 200 tokens.
 Our tests showed that including the context helped to improve the performance of the classification model considerably (~8% accuracy).
 We tried other approaches like using two XLM-RoBERTa models as a duo, where one receives the sentence and one the context, and a shared-layer model, where both inputs are fed separately trough the same model.