Model Card for Model ID

Given a sentence, our model predicts whether or not the sentence contains "persuasive" language, or language designed to elicit emotions or change readers' opinions. The model was tuned on the SemEval 2020 Task 11 dataset. However, we preprocessed the dataset to adapt it from multilabel technique classification and span-classification to our binary classification task.

There are two revisions:

BERT - we finetuned bert-large-cased on our main branch
XLM-RoBERTa - we finetuned xlm-roberta-base on our roberta branch.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Ultraviolet Text
Model type: BERT / RoBERTa
Language(s) (NLP): En
License: MIT
Finetuned from model [optional]: bert-large-cased / xlm-roberta-base

How to Get Started with the Model

Use the code below to get started with the model.

Loading from the main branch (BERT)

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-large-cased")
model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector")

Loading from the `roberta` branch (XLM RoBERTa)

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector", revision="roberta")

Training Details

Training Data

Training data can be downloaded from the Semeval website.

Training Procedure

The training was done using Huggingface Trainer on both our local machines and Intel Developer Cloud kernels, enabling us to prototype multiple models simultaneously.

Preprocessing [optional]

All sentences containing spans of persuasive language techniques were labeled as persuasive language examples, while all others were labeled as examples of non-persuasive language.

Testing Data, Factors & Metrics

Testing Data

The test data is from the test data of sem_eval_2020_task_11, which can be downloaded from the original website. The test data contains 38.25% persuasive examples and non-persuasive examples 61.75%. Metrics can be found in the following section

Metrics

Metrics are reported in the format (main_branch), (roberta branch)

Accuracy - 0.7165140725669719, 0.7326693227091633
Recall - 0.6875584658559402, 0.6822916666666666
Precision - 0.5941794664510913, 0.6415279138099902
F1 - 0.6374674761491761, 0.6612821807168097

Overall, the roberta branch performs better, and with faster inference times. Thus, we recommend users download from the roberta revision.

chreh
/

persuasive_language_detector

Model Card for Model ID

Model Details

Model Description

How to Get Started with the Model

Loading from the main branch (BERT)

Loading from the `roberta` branch (XLM RoBERTa)

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Testing Data, Factors & Metrics

Testing Data

Metrics

Dataset used to train chreh/persuasive_language_detector

Model Card for Model ID

Model Details

Model Description

How to Get Started with the Model

Loading from the main branch (BERT)

Loading from the roberta branch (XLM RoBERTa)

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Testing Data, Factors & Metrics

Testing Data

Metrics

Dataset used to train chreh/persuasive_language_detector

Loading from the `roberta` branch (XLM RoBERTa)