MARBERT_sentiment_sarcasm_speech_act_classifier

This model is a fine-tuned version of UBC-NLP/MARBERTv2 on an Khalaya/Arabic_YouTube_Comments dataset. The model can classify comments into three categories:

Sentiment (Positive, Neutral, Negative, Mixed)
Speech act (Expression, Assertion, Question, Recommendation, Request, Miscellaneous)
Sarcasm (Yes, No)

Model Details

Model Description

Developed by: Faris, CTO of Khalaya company.
Funded by: Khalaya company.
Shared by: Khalaya company.
Model type: BERT
Language(s) (NLP): Arabic
License: MIT
Finetuned from model: MARBERT

Model Sources

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

How to Get Started with the Model

Clone the model repo.

git clone https://huggingface.co/Khalaya/MARBERT_sentiment_sarcasm_speech_act_classifier

Navigate to the model directory.

cd MARBERT_sentiment_sarcasm_speech_act_classifier

Use the model by using provided helper functions

# Load helper functions
from utils import get_model,classify
from transformers import TFAutoModelForMaskedLM,AutoTokenizer
# Load the base model it is weights
model = TFAutoModelForMaskedLM.from_pretrained("Khalaya/MARBERT_sentiment_sarcasm_speech_act_classifier")
model = get_model(model)
tokenizer = AutoTokenizer.from_pretrained("Khalaya/MARBERT_sentiment_sarcasm_speech_act_classifier")


out = classify("هلا",model,tokenizer)
print(out)
"""Expected output
[
  {'text': 'هلا', 
  'sentiment': 'Neutral', 
  'speech_act': 'Expression', 
  'sarcasm': 'Not Sarcastic'}
  ]
"""
out = classify(["ياخي ولله انك تضحكني دايما ههههههه","هلا"],model,tokenizer)
print(out)
"""Expected output
[
  {'text': 'ياخي ولله انك تضحكني دايما ههههههه', 
  'sentiment': 'Positive', 
  'speech_act': 'Expression', 
  'sarcasm': 'Not Sarcastic'}, 
  {'text': 'هلا', 
  'sentiment': 'Neutral', 
  'speech_act': 'Expression', 
  'sarcasm': 'Not Sarcastic'}
  ]
"""

Uses

Direct Use

This model can be used directly for classifying Arabic comments into the three aforementioned categories without further fine-tuning.

Downstream Use

The model can be fine-tuned for other Arabic text classification tasks or integrated into larger applications that require sentiment analysis, speech act recognition, or sarcasm detection in Arabic text.

Out-of-Scope Use

The model is not designed for tasks outside the domain of Arabic text classification, such as generating text or performing translation tasks.

Bias, Risks, and Limitations

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. The model might have biases based on the dataset it was trained on and may not perform equally well across all domains or topics of Arabic YouTube comments.

Training Details

Training Data

The model was trained on the Arabic YouTube Comments dataset, which includes comments labeled for sentiment, speech act, and sarcasm.

Training Procedure

The training involved preprocessing the text data, tokenizing it using the MARBERT tokenizer, and training the model using a TPU with mixed precision for 7 epochs. The learning rate was scheduled using a one-cycle policy.

Preprocessing

The text data was tokenized with a maximum length of 128 tokens.

Training Hyperparameters

EPOCHS: 7
LEARNING_RATE_MAX: 2e-5
LEARNING_RATE: 2e-5
PCT: 0.02
BATCH_SIZE: 512
WD: 0.001
MAX_LENGTH: 128
DROP_OUT: 0.1

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a test split from the Arabic YouTube Comments dataset.

Factors

Evaluation was conducted on different classes of sentiment, speech act, and sarcasm.

Metrics

The model's performance was measured using precision, recall, and F1-score for each class.

Results

The evaluation results are as follows:

Sentiment Classification

Precision: 0.91 (Positive), 0.67 (Neutral), 0.82 (Negative), 0.00 (Mixed)
Recall: 0.89 (Positive), 0.62 (Neutral), 0.88 (Negative), 0.00 (Mixed)
F1-score: 0.90 (Positive), 0.64 (Neutral), 0.85 (Negative), 0.00 (Mixed)

Speech Act Classification

Precision: 0.92 (Expression), 0.68 (Assertion), 0.75 (Question), 0.60 (Recommendation), 0.66 (Request), 0.28 (Miscellaneous)
Recall: 0.80 (Expression), 0.83 (Assertion), 0.85 (Question), 0.72 (Recommendation), 0.81 (Request), 0.39 (Miscellaneous)
F1-score: 0.86 (Expression), 0.74 (Assertion), 0.80 (Question), 0.66 (Recommendation), 0.73 (Request), 0.33 (Miscellaneous)

Sarcasm Detection

Precision: 0.99 (No), 0.38 (Yes)
Recall: 0.86 (No), 0.88 (Yes)
F1-score: 0.92 (No), 0.53 (Yes)

Technical Specifications [optional]

Model Architecture and Objective

The model is based on the MARBERT architecture, fine-tuned for multi-label classification to predict sentiment, speech act, and sarcasm.

Compute Infrastructure

The model was trained on TPU v3-8.

Hardware

TPU Type: TPU v3-8

Software

TensorFlow version: 2.15.0
Transformers version: 4.37.2

Citation [optional]

BibTeX:

@misc{faris2024marbertv2,
  author = {Faris},
  title = {Multi-label Classification of Arabic YouTube Comments using MARBERTv2},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/khalaya/MARBERTv2}},
}

APA:

Faris. (2024). Multi-label Classification of Arabic YouTube Comments using MARBERTv2. Hugging Face.

Glossary

Sentiment Analysis: The task of classifying the sentiment expressed in text.
Speech Act: The function of an utterance, such as asking a question, making a statement, or giving a command.
Sarcasm Detection: The task of identifying sarcasm in text.

More Information

For more information, please contact Faris at f.alahmadi@khalaya.com.sa

Model Card Authors

Faris, CTO of Khalaya

Model Card Contact

For further questions, please reach out to Faris at f.alahmadi@khalaya.com.sa

Khalaya
/

MARBERT_sentiment_sarcasm_speech_act_classifier

MARBERT_sentiment_sarcasm_speech_act_classifier

Model Details

Model Description

Model Sources

How to Get Started with the Model

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary

More Information

Model Card Authors

Model Card Contact

Finetuned from

Evaluation results

MARBERT_sentiment_sarcasm_speech_act_classifier

Model Details

Model Description

Model Sources

How to Get Started with the Model

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary

More Information

Model Card Authors

Model Card Contact

Finetuned from UBC-NLP/MARBERTv2

Evaluation results

Finetuned from