Edit model card

arabicSent-ChamaBert

This model is a fine-tuned version of the aubmindlab/bert-base-arabertv02-twitter model on a webscraped dataset of Arabic comments. It has been trained specifically for sentiment classification tasks in Moroccan Arabic, covering both Standard Arabic and dialectal variations. The model's performance on the evaluation set is as follows:

  • Loss: 0.1626
  • Accuracy: 0.9073
  • F1: 0.9129
  • Roc Auc: 0.9337

Dataset

The dataset used for training and evaluation consists of a collection of Moroccan Arabic comments specifically focused on sentiments towards the effects of vaccines. It contains a total of 81,971 comments, with sentiment labels of "Negative" and "Positive". The dataset provides ground truth annotations that enable the model to learn the association between the language used in comments and the corresponding sentiment expressed.

The data collection process adhered to ethical considerations, respecting user privacy and complying with applicable data protection regulations. Measures were taken to ensure the anonymization of user identities and the removal of any personally identifiable information.

Framework versions

  • Transformers 4.28.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
32
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.