metadata

license: apache-2.0
datasets:
  - hadyelsahar/ar_res_reviews
language:
  - ar
metrics:
  - accuracy
  - precision
  - recall
  - f1
base_model:
  - aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification

🍽️ Arabic Restaurant Review Sentiment Analysis 🚀

📌 Overview

This project fine-tunes a transformer-based model to analyze sentiment in Arabic restaurant reviews.
We utilized Hugging Face’s model training pipeline and deployed the final model as an interactive Gradio web app.

📥 Data Collection

The dataset used for fine-tuning was sourced from Hugging Face Datasets, specifically:
📂 Arabic Restaurant Reviews Dataset
It contains restaurant reviews in Arabic labeled with sentiment polarity.

🔄 Data Preparation

Cleaning & Normalization:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (e.g., إ, أ, آ → ا, ة → ه).
- Downsampled positive reviews to balance the dataset.
Tokenization:
- Used AraBERT tokenizer for efficient text processing.
Train-Test Split:
- 80% Training | 20% Testing.

🏋️ Fine-Tuning & Results

The model was fine-tuned using Hugging Face Transformers on a dataset of restaurant reviews.

📊 Evaluation Metrics

Metric	Score
Train Loss	`0.470`
Eval Loss	`0.373`
Accuracy	`86.41%`
Precision	`87.01%`
Recall	`86.49%`
F1-score	`86.75%`

⚙️ Training Parameters

model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",       
    save_strategy="epoch",             
    per_device_train_batch_size=8,  
    per_device_eval_batch_size=8,   
    num_train_epochs=4,  
    weight_decay=1,  
    learning_rate=1e-5,  
    lr_scheduler_type="cosine",  
    warmup_ratio=0.1,  
    fp16=True,
    report_to="none",
    save_total_limit=2,
    gradient_accumulation_steps=2,
    load_best_model_at_end=True,
    max_grad_norm=1.0,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)