--- library_name: transformers license: mit datasets: - sem_eval_2020_task_11 language: - en --- # Model Card for Model ID Given a sentence, our model predicts whether or not the sentence contains "persuasive" language, or language designed to elicit emotions or change readers' opinions. The model was tuned on the SemEval 2020 Task 11 dataset. However, we preprocessed the dataset to adapt it from multilabel technique classification and span-classification to our binary classification task. There are two revisions: * BERT - we finetuned `bert-large-cased` on our main branch * XLM-RoBERTa - we finetuned `xlm-roberta-base` on our `roberta` branch. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** Ultraviolet Text - **Model type:** BERT / RoBERTa - **Language(s) (NLP):** En - **License:** MIT - **Finetuned from model [optional]:** bert-large-cased / xlm-roberta-base ## How to Get Started with the Model Use the code below to get started with the model. ### Loading from the main branch (BERT) ```py from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("bert-large-cased") model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector") ``` ### Loading from the `roberta` branch (XLM RoBERTa) ```py from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base") model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector", revision="roberta") ``` ## Training Details ### Training Data Training data can be downloaded from [the Semeval website](https://propaganda.qcri.org/semeval2020-task11/). ### Training Procedure The training was done using Huggingface Trainer on both our local machines and Intel Developer Cloud kernels, enabling us to prototype multiple models simultaneously. #### Preprocessing [optional] All sentences containing spans of persuasive language techniques were labeled as persuasive language examples, while all others were labeled as examples of non-persuasive language. ### Testing Data, Factors & Metrics #### Testing Data The test data is from the test data of `sem_eval_2020_task_11`, which can be downloaded from [the original website](https://propaganda.qcri.org/semeval2020-task11/). The test data contains 38.25% persuasive examples and non-persuasive examples 61.75%. Metrics can be found in the following section #### Metrics Metrics are reported in the format (main_branch), (roberta branch) * Accuracy - 0.7165140725669719, 0.7326693227091633 * Recall - 0.6875584658559402, 0.6822916666666666 * Precision - 0.5941794664510913, 0.6415279138099902 * F1 - 0.6374674761491761, 0.6612821807168097 Overall, the `roberta` branch performs better, and with faster inference times. Thus, we recommend users download from the `roberta` revision.