Rahilgh
/

model4_1

+---
+language:
+  - ar
+  - fr
+license: mit
+pipeline_tag: text-classification
+tags:
+  - misinformation-detection
+  - fake-news
+  - text-classification
+  - algerian-darija
+  - arabic
+  - mbert
+model_name: mBERT-Algerian-Darija
+base_model: bert-base-multilingual-cased
+---
+# mBERT — Algerian Darija Misinformation Detection
+Fine-tuned **BERT-base-multilingual-cased** for detecting misinformation in **Algerian Darija** text.
+- **Base model**: `bert-base-multilingual-cased` (170M parameters)
+- **Task**: Multi-class text classification (5 classes)
+- **Classes**: F (Factual), R (Reporting), N (Non-factual), M (Misleading), S (Satire)
+---
+## Performance (Test set: 3,344 samples)
+- **Accuracy**: 75.42%
+- **Macro F1**: 64.48%
+- **Weighted F1**: 75.70%
+**Per-class F1**:
+- Factual (F): 83.72%
+- Reporting (R): 76.35%
+- Non-factual (N): 81.01%
+- Misleading (M): 61.46%
+- Satire (S): 19.86%
+---
+## Training Summary
+- **Max sequence length**: 128
+- **Epochs**: 3 (early stopping)
+- **Batch size**: 16
+- **Learning rate**: 2e-5
+- **Loss**: Weighted CrossEntropy
+- **Seed**: 42 (reproducibility)
+---
+## Usage
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+MODEL_ID = "Rahilgh/model4_1"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
+model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device).eval()
+LABEL_MAP = {0: "F", 1: "R", 2: "N", 3: "M", 4: "S"}
+LABEL_NAMES = {
+    "F": "Factual",
+    "R": "Reporting",
+    "N": "Non-factual",
+    "M": "Misleading",
+    "S": "Satire"
+}
+texts = [
+    "قالك بلي رايحين ينحو الباك هذا العام",
+]
+for text in texts:
+    inputs = tokenizer(
+        text,
+        return_tensors="pt",
+        max_length=128,
+        truncation=True,
+        padding=True,
+    ).to(device)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        probs = torch.softmax(outputs.logits, dim=1)[0]
+        pred_id = probs.argmax().item()
+        confidence = probs[pred_id].item()
+    label = LABEL_MAP[pred_id]
+    print(f"Text: {text}")
+    print(f"Prediction: {LABEL_NAMES[label]} ({label}) — {confidence:.2%}\n")