🤖 Model description
This model was trained on ~25k heterogeneous manually annotated sentences (📚 Stab et al. 2018) of controversial topics to classify text into one of two labels: 🏷 NON-ARGUMENT (0) and ARGUMENT (1).
The dataset (📚 Stab et al. 2018) consists of ARGUMENTS (~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a NON-ARGUMENT (~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage.
RoBERTArg was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters:
training_args = TrainingArguments( num_train_epochs=2, learning_rate=2.3102e-06, seed=8, per_device_train_batch_size=64, per_device_eval_batch_size=64, )
The model was evaluated on an evaluation set (20%):
|Model||Acc||F1||R arg||R non||P arg||P non|
Showing the confusion matrix using again the evaluation set:
⚠️ Intended Uses & Potential Limitations
The model can only be a starting point to dive into the exciting field of argument mining. But be aware. An argument is a complex structure, with multiple dependencies. Therefore, the model may perform less well on different topics and text types not included in the training set.
Enjoy and stay tuned! 🚀
🐦 Twitter: @chklamm
- Downloads last month