5roop's picture
Refactored example code a bit
528c17b

roberta-base-frenk-hate

Text classification model based on roberta-base and fine-tuned on the FRANK dataset comprising of LGBT and migrant hatespeech. Only the English subset of the data was used for fine-tuning and the dataset has been relabeled for binary classification (offensive or acceptable).

Fine-tuning hyperparameters

Fine-tuning was performed with simpletransformers. Beforehand a brief hyperparameter optimisation was performed and the presumed optimal hyperparameters are:


model_args = {
        "num_train_epochs": 6,
        "learning_rate": 3e-6,
        "train_batch_size": 69}

Performance

The same pipeline was run with two other models and with the same dataset. Accuracy and macro F1 score were recorded for each of the 6 fine-tuning sessions and post festum analyzed.

model average accuracy average macro F1
roberta-base-frenk-hate 0.7915 0.7785
xlm-roberta-large 0.7904 0.77876
xlm-roberta-base 0.7577 0.7402
distilbert-base-uncased-finetuned-sst-2-english 0.7201 0.69862

From recorded accuracies and macro F1 scores p-values were also calculated:

Comparison with xlm-roberta-base:

test accuracy p-value macro F1 p-value
Wilcoxon 0.00781 0.00781
Mann Whithney U-test 0.00108 0.00108
Student t-test 1.35e-08 1.05e-07

Comparison with distilbert-base-uncased-finetuned-sst-2-english:

test accuracy p-value macro F1 p-value
Wilcoxon 0.00781 0.00781
Mann Whithney U-test 0.00108 0.00108
Student t-test 1.33e-12 3.03e-12

Comparison with xlm-roberta-large yielded inconclusive results; whereas accuracy was outperformed by this model, the macro F1 score was not. Neither metric allowed for statistically significant conclusions about which model might be better.

Use examples

from simpletransformers.classification import ClassificationModel
model_args = {
        "num_train_epochs": 6,
        "learning_rate": 3e-6,
        "train_batch_size": 69}

model = ClassificationModel(
    "roberta", "5roop/roberta-base-frenk-hate", use_cuda=True,
    args=model_args
    
)

predictions, logit_output = model.predict(["Build the wall", 
                                        "Build the wall of trust"]
                                        )
predictions
### Output:
### array([1, 0])