Khalaya
/

MARBERT_sentiment_sarcasm_speech_act_classifier

@@ -12,36 +12,175 @@ probably proofread and complete it, then remove this comment. -->
 # MARBERT_sentiment_sarcasm_speech_act_classifier
-This model is a fine-tuned version of [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2) on an unknown dataset.
-It achieves the following results on the evaluation set:
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- optimizer: None
-- training_precision: float32
-### Training results
-### Framework versions
-- Transformers 4.41.2
-- TensorFlow 2.15.0
-- Tokenizers 0.19.1

 # MARBERT_sentiment_sarcasm_speech_act_classifier
+This model is a fine-tuned version of [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2) on an [Khalaya/Arabic_YouTube_Comments](https://huggingface.co/datasets/Khalaya/Arabic_YouTube_Comments) dataset.
+The model can classify comments into three categories:
+1. Sentiment (Positive, Neutral, Negative, Mixed)
+2. Speech act (Expression, Assertion, Question, Recommendation, Request, Miscellaneous)
+3. Sarcasm (Yes, No)
+## Model Details
+### Model Description
+- **Developed by:** Faris, CTO of Khalaya company.
+- **Funded by:** Khalaya company.
+- **Shared by:** Khalaya company.
+- **Model type:** BERT
+- **Language(s) (NLP):** Arabic
+- **License:** MIT
+- **Finetuned from model:** MARBERT
+### Model Sources
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+### Direct Use
+This model can be used directly for classifying Arabic YouTube comments into the three aforementioned categories without further fine-tuning.
+### Downstream Use
+The model can be fine-tuned for other Arabic text classification tasks or integrated into larger applications that require sentiment analysis, speech act recognition, or sarcasm detection in Arabic text.
+### Out-of-Scope Use
+The model is not designed for tasks outside the domain of Arabic text classification, such as generating text or performing translation tasks.
+## Bias, Risks, and Limitations
+### Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. The model might have biases based on the dataset it was trained on and may not perform equally well across all domains or topics of Arabic YouTube comments.
+## How to Get Started with the Model
+Use the code below to get started with the model:
+```python
+from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("path_to_your_model")
+model = TFAutoModelForSequenceClassification.from_pretrained("path_to_your_model")
+inputs = tokenizer("Your text here", return_tensors="tf")
+outputs = model(inputs)
+```
+## Training Details
+### Training Data
+The model was trained on the Arabic YouTube Comments dataset, which includes comments labeled for sentiment, speech act, and sarcasm.
+### Training Procedure
+The training involved preprocessing the text data, tokenizing it using the MARBERT tokenizer, and training the model using a TPU with mixed precision for 7 epochs. The learning rate was scheduled using a one-cycle policy.
+#### Preprocessing
+The text data was tokenized with a maximum length of 128 tokens.
+#### Training Hyperparameters
+- **EPOCHS:** 7
+- **LEARNING_RATE_MAX:** 2e-5
+- **LEARNING_RATE:** 2e-5
+- **PCT:** 0.02
+- **BATCH_SIZE:** 512
+- **WD:** 0.001
+- **MAX_LENGTH:** 128
+- **DROP_OUT:** 0.1
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+The model was evaluated on a test split from the Arabic YouTube Comments dataset.
+#### Factors
+Evaluation was conducted on different classes of sentiment, speech act, and sarcasm.
+#### Metrics
+The model's performance was measured using precision, recall, and F1-score for each class.
+### Results
+The evaluation results are as follows:
+**Sentiment Classification**
+- Precision: 0.91 (Positive), 0.67 (Neutral), 0.82 (Negative), 0.00 (Mixed)
+- Recall: 0.89 (Positive), 0.62 (Neutral), 0.88 (Negative), 0.00 (Mixed)
+- F1-score: 0.90 (Positive), 0.64 (Neutral), 0.85 (Negative), 0.00 (Mixed)
+**Speech Act Classification**
+- Precision: 0.92 (Expression), 0.68 (Assertion), 0.75 (Question), 0.60 (Recommendation), 0.66 (Request), 0.28 (Miscellaneous)
+- Recall: 0.80 (Expression), 0.83 (Assertion), 0.85 (Question), 0.72 (Recommendation), 0.81 (Request), 0.39 (Miscellaneous)
+- F1-score: 0.86 (Expression), 0.74 (Assertion), 0.80 (Question), 0.66 (Recommendation), 0.73 (Request), 0.33 (Miscellaneous)
+**Sarcasm Detection**
+- Precision: 0.99 (No), 0.38 (Yes)
+- Recall: 0.86 (No), 0.88 (Yes)
+- F1-score: 0.92 (No), 0.53 (Yes)
+## Technical Specifications [optional]
+### Model Architecture and Objective
+The model is based on the MARBERT architecture, fine-tuned for multi-label classification to predict sentiment, speech act, and sarcasm.
+### Compute Infrastructure
+The model was trained on TPU v3-8.
+#### Hardware
+- **TPU Type:** TPU v3-8
+#### Software
+- **TensorFlow version:** 2.15.0
+- **Transformers version:** 4.37.2
+## Citation [optional]
+**BibTeX:**
+```bibtex
+@misc{faris2024marbertv2,
+  author = {Faris},
+  title = {Multi-label Classification of Arabic YouTube Comments using MARBERTv2},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/khalaya/MARBERTv2}},
+}
+```
+**APA:**
+Faris. (2024). Multi-label Classification of Arabic YouTube Comments using MARBERTv2. Hugging Face. Retrieved from https://huggingface.co/khalaya/MARBERTv2
+## Glossary [optional]
+- **Sentiment Analysis:** The task of classifying the sentiment expressed in text.
+- **Speech Act:** The function of an utterance, such as asking a question, making a statement, or giving a command.
+- **Sarcasm Detection:** The task of identifying sarcasm in text.
+## More Information [optional]
+For more information, please contact Faris at faris@khalaya.com.
+## Model Card Authors [optional]
+- Faris, CTO of Khalaya
+## Model Card Contact
+For further questions, please reach out to Faris at f.alahmadi@khalaya.com.sa