arabicSent-ChamaBert

This model is a fine-tuned version of the aubmindlab/bert-base-arabertv02-twitter model on a webscraped dataset of Arabic comments. It has been trained specifically for sentiment classification tasks in Moroccan Arabic, covering both Standard Arabic and dialectal variations. The model's performance on the evaluation set is as follows:

Loss: 0.1626
Accuracy: 0.9073
F1: 0.9129
Roc Auc: 0.9337

Dataset

The dataset used for training and evaluation consists of a collection of Moroccan Arabic comments specifically focused on sentiments towards the effects of vaccines. It contains a total of 81,971 comments, with sentiment labels of "Negative" and "Positive". The dataset provides ground truth annotations that enable the model to learn the association between the language used in comments and the corresponding sentiment expressed.

The data collection process adhered to ethical considerations, respecting user privacy and complying with applicable data protection regulations. Measures were taken to ensure the anonymization of user identities and the removal of any personally identifiable information.

Framework versions

Transformers 4.28.0
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3