Model Card for Model ID

Given a sentence, our model predicts whether or not the sentence contains "persuasive" language, or language designed to elicit emotions or change readers' opinions. The model was tuned on the SemEval 2020 Task 11 dataset. However, we preprocessed the dataset to adapt it from multilabel technique classification and span-classification to our binary classification task.

There are two revisions:

  • BERT - we finetuned bert-large-cased on our main branch
  • XLM-RoBERTa - we finetuned xlm-roberta-base on our roberta branch.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Ultraviolet Text
  • Model type: BERT / RoBERTa
  • Language(s) (NLP): En
  • License: MIT
  • Finetuned from model [optional]: bert-large-cased / xlm-roberta-base

How to Get Started with the Model

Use the code below to get started with the model.

Loading from the main branch (BERT)

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-large-cased")
model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector")

Loading from the roberta branch (XLM RoBERTa)

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector", revision="roberta")

Training Details

Training Data

Training data can be downloaded from the Semeval website.

Training Procedure

The training was done using Huggingface Trainer on both our local machines and Intel Developer Cloud kernels, enabling us to prototype multiple models simultaneously.

Preprocessing [optional]

All sentences containing spans of persuasive language techniques were labeled as persuasive language examples, while all others were labeled as examples of non-persuasive language.

Testing Data, Factors & Metrics

Testing Data

The test data is from the test data of sem_eval_2020_task_11, which can be downloaded from the original website. The test data contains 38.25% persuasive examples and non-persuasive examples 61.75%. Metrics can be found in the following section

Metrics

Metrics are reported in the format (main_branch), (roberta branch)

  • Accuracy - 0.7165140725669719, 0.7326693227091633
  • Recall - 0.6875584658559402, 0.6822916666666666
  • Precision - 0.5941794664510913, 0.6415279138099902
  • F1 - 0.6374674761491761, 0.6612821807168097

Overall, the roberta branch performs better, and with faster inference times. Thus, we recommend users download from the roberta revision.

Downloads last month
4
Safetensors
Model size
334M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train chreh/persuasive_language_detector