Fine-tuned CAMeL-BERT Model for Sentiment Analysis in Moroccan Darija

Model Name: CAMeL-BERT Fine-Tuned for Moroccan Darija Sentiment Analysis
Model ID: NerdyPy/fine_tuned_model_sentiment_analysis
Language: Arabic (Modern Standard Arabic and Moroccan Darija)
Task: Sentiment Analysis (Negative, Neutral, Positive)

Model Description

This model is a fine-tuned version of the CAMeL-Lab BERT model, specifically adapted for sentiment analysis in Moroccan Darija, a highly under-resourced Arabic dialect. The model has been trained to classify Arabic text—including both Modern Standard Arabic (MSA) and Moroccan Darija—into three sentiment categories:

Negative
Neutral
Positive

By focusing on Moroccan Darija, this model addresses the scarcity of NLP resources for this dialect, enhancing sentiment analysis capabilities in mixed-language contexts common in Moroccan user-generated content.

Intended Use

Primary Use Case

Sentiment analysis of user-generated content, such as comments and reviews, in Moroccan Darija and MSA.

Applications

Analyzing public opinion on social media platforms and electronic journals.
Assisting researchers in understanding societal attitudes and trends.
Supporting policymakers and organizations in gauging public sentiment.

Users

Researchers and data scientists in NLP.
Organizations analyzing Arabic-language social media.
Developers building sentiment analysis tools for Arabic dialects.

Limitations and Risks

Dialectal Variations

Performance may vary on other Arabic dialects not represented in the training data.

Data Bias

The model may reflect biases present in the training datasets.

Language Mixing (Code-Switching)

The model may face challenges when processing text that heavily mixes Moroccan Darija with other languages (e.g., French, English, Spanish). This could affect the accuracy of sentiment classification in such cases. For example: "واش كتفهم le français؟" In this sentence, the speaker switches from Moroccan Darija to French within the same sentence. The model, primarily trained on Arabic text, may not accurately interpret the sentiment due to unfamiliarity with the non-Arabic portion.

Generalization

Limited performance on topics or vocabulary outside the training data.

How to Use

You can use this model with the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("NerdyPy/fine_tuned_model_sentiment_analysis")
model = AutoModelForSequenceClassification.from_pretrained("NerdyPy/fine_tuned_model_sentiment_analysis")

# Example text in Arabic
text = "العمل في هذا المكان كان رائعاً، ولكن شي مرات ما كاينش التنظيم"

NerdyPy
/

fine_tuned_model_sentiment_analysis

You need to agree to share your contact information to access this model