CrisisHateMM-finetuned-subtask-A

This model card provides details and information about CrisisHateMM-finetuned-subtask-A, a multimodal classification Model that achieved state-of-the-art performance on the test set (subtask A) of the CrisisHateMM Dataset.

Model Description

The model leverages the Twitter-based RoBERTa and Swin Transformer V2 to encode textual and visual modalities, and employs the Multilayer Perceptron (MLP) fusion mechanism for classification.

Uses

It can be used directly as a multimodal binary classification model for the subtask A of the Shared task on Multimodal Hate Speech Event Detection at CASE@EACL2024

How to Get Started with the Model

Use the following code to get started with the model. First, install the required Python packages:

! pip install --quiet autogluon.multimodal

Then perform pre-processing on the CrisisHateMM dataset：

import pandas as pd
import re

def preprocess_text(text):

    # Remove URLs
    text = re.sub(r"http\S+", "", text)

    # Remove mentions (e.g., @username)
    text = re.sub(r"@\S+", "", text)

    # Remove remaining @ symbols
    text = re.sub(r"@", "", text)

    # Remove ".com" at the end of the text
    text = re.sub(r".com$", "", text)

    # Remove emojis using a regular expression pattern
    emoji_pattern = re.compile("[" u"\U0001F600-\U0001F64F" u"\U0001F300-\U0001F5FF"
                               u"\U0001F680-\U0001F6FF" u"\U0001F1E0-\U0001F1FF" "]+", flags=re.UNICODE)
    preprocessed_text = emoji_pattern.sub(r'', text)

    return preprocessed_text

stA_test = pd.read_csv("/data/test/stA_test.csv")

test_data = pd.DataFrame(columns=['image_path', 'text'])
count = 1

for column, row in stA_test.iterrows():
    sample_image = row['filename'].replace("/content/drive/MyDrive/CASE2023_Task4/CASE2023_TASK4_TestData/subtaskA/","")
    sample_image = '/data/test/subtaskA/'+ sample_image
    sample_text = row['text']
    sample_text = preprocess_text(sample_text)
    test_data.loc[count] = [sample_image, sample_text]
    count+=1

Finally, import the model to perform classification and get prediction results:

import os
import json
import numpy as np
import warnings
from autogluon.multimodal import MultiModalPredictor
warnings.filterwarnings('ignore')
np.random.seed(0)

model_path = 'model.ckpt'
predictor = MultiModalPredictor.load(model_path)
predictions = predictor.predict(test_data)

# Get index number from test set
id = []
for column, row in stA_test.iterrows():
  sample_image = row['filename'].replace("/content/drive/MyDrive/CASE2023_Task4/CASE2023_TASK4_TestData/subtaskA/","")
  sample_id = int(sample_image.replace(".jpg",""))
  id.append(sample_id)

# Save the prediction results
prediction = predictions.tolist()
results = []
for x, y in zip(id, prediction):
  line = '{"index": '+ str(x)+ ', "prediction": '+ str(y)+ '}'
  results.append(line)

# Export the DataFrame to a JSON file
model_name = 'subtaskA'
file_path = f"/content/drive/MyDrive/CASE2024/subtaskA/results/{model_name}-submission.json"

with open(file_path, 'w') as json_file:
    for i in range(len(results) - 1):
        json_file.write(results[i])
        json_file.write('\n')
    json_file.write(results[-1])

Training Details

The model was trained with the following hyperparameters: a base learning rate of 1e-4, decay rate of 0.9 using cosine decay scheduling, batch size of 8, and a manual seed of 0 for reproducibility. The models were optimized using the AdamW optimizer for up to 10 epochs, or until an early stopping criterion was met to prevent overfitting. All experiments were conducted on the Google Colaboratory platform with a NVIDIA A100 GPU.

Dataset

Training data is provided at: https://drive.google.com/drive/folders/173EJjsNblxhjACXzIWardUqCcSYtcJh0

Evaluation/Validation data is provided at: https://drive.google.com/drive/folders/1LL2OD7v2GhrmeC0j2Gm9YFCOa5vobVjc

Testing data is provided at: https://drive.google.com/drive/folders/1DIVebYypb2x9RJjoSeOmr5yEm5rCXt54

Evaluation Results

When fine-tuned on training data, this model achieves the following results on the test set:

Precision: 0.8720
Recall: 0.8737
F1: 0.8727