roberta-base-frenk-hate
Text classification model based on roberta-base
and fine-tuned on the FRANK dataset comprising of LGBT and migrant hatespeech. Only the English subset of the data was used for fine-tuning and the dataset has been relabeled for binary classification (offensive or acceptable).
Fine-tuning hyperparameters
Fine-tuning was performed with simpletransformers
. Beforehand a brief hyperparameter optimisation was performed and the presumed optimal hyperparameters are:
model_args = {
"num_train_epochs": 6,
"learning_rate": 3e-6,
"train_batch_size": 69}
Performance
The same pipeline was run with two other models and with the same dataset. Accuracy and macro F1 score were recorded for each of the 6 fine-tuning sessions and post festum analyzed.
model | average accuracy | average macro F1 |
---|---|---|
roberta-base-frenk-hate | 0.7915 | 0.7785 |
xlm-roberta-large | 0.7904 | 0.77876 |
xlm-roberta-base | 0.7577 | 0.7402 |
distilbert-base-uncased-finetuned-sst-2-english | 0.7201 | 0.69862 |
From recorded accuracies and macro F1 scores p-values were also calculated:
Comparison with xlm-roberta-base
:
test | accuracy p-value | macro F1 p-value |
---|---|---|
Wilcoxon | 0.00781 | 0.00781 |
Mann Whithney U-test | 0.00108 | 0.00108 |
Student t-test | 1.35e-08 | 1.05e-07 |
Comparison with distilbert-base-uncased-finetuned-sst-2-english
:
test | accuracy p-value | macro F1 p-value |
---|---|---|
Wilcoxon | 0.00781 | 0.00781 |
Mann Whithney U-test | 0.00108 | 0.00108 |
Student t-test | 1.33e-12 | 3.03e-12 |
Comparison with xlm-roberta-large
yielded inconclusive results; whereas accuracy was outperformed by this model, the macro F1 score was not. Neither metric allowed for statistically significant conclusions about which model might be better.
Use examples
from simpletransformers.classification import ClassificationModel
model_args = {
"num_train_epochs": 6,
"learning_rate": 3e-6,
"train_batch_size": 69}
model = ClassificationModel(
"roberta", "5roop/roberta-base-frenk-hate", use_cuda=True,
args=model_args
)
predictions, logit_output = model.predict(["Build the wall",
"Build the wall of trust"]
)
predictions
### Output:
### array([1, 0])