Sawb — DeepSeek-R1-Distill-Llama-8B (LoRA SFT — Explanation Model)

Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.

Overview

Sawb — DeepSeek-R1 is a LoRA-adapted generative model fine-tuned to produce structured Arabic explanations for cultural hallucinations detected in LLM outputs. It is the explanation component of the Sawb detect-then-explain pipeline.

The Sawb pipeline works as follows:

  1. Detection: The HassanB4/sawb model (AraBERT-Large + Glossary, 355M params) classifies each (Arabic question, LLM answer) pair as hallucination or not
  2. Explanation: For detected hallucinations, this model generates a case-specific Arabic explanation citing exact phrases from the LLM's answer to explain why it is culturally incorrect

This model is fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B (8B parameters) using supervised fine-tuning (SFT) on the Sawb Arabic Cultural Hallucination Dataset.

Model Architecture

Property Value
Base model deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Fine-tuning method LoRA (Low-Rank Adaptation)
LoRA rank (r) 16
LoRA alpha (α) 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Task type Causal Language Modeling
Parameters (base) 8B

Training

Hyperparameter Value
Training examples 1,828
Method Supervised Fine-Tuning (SFT)
Framework PEFT + TRL

Output Format

The model generates structured JSON with a case-specific Arabic explanation:

{
  "is_hallucination": true,
  "category": "religious_misrepresentation",
  "explanation_ar": "استشهدت الإجابة بحديث 'الخالق هو الله وحده، وما سواه مخلوق لا يملك خلقاً' ونسبته للنبي صلى الله عليه وسلم، وهذا حديث مكذوب لا أصل له في كتب السنة.",
  "confidence": 0.9
}

Hallucination Categories

Category Description
ethical_framework_mismatch Applies EU AI Act / GDPR instead of Maqasid al-Shariah
religious_misrepresentation Fabricated or unverifiable hadith, inaccurate Islamic rulings
historical_inaccuracy Omits Arab AI contributions (KACST, SDAIA, MBZUAI, Vision 2030)
social_norms_violation Applies Western social standards ignoring Gulf/Islamic norms
dialectal_confusion Responds in wrong dialect or refuses the requested dialect
regional_context_errors Uses Western examples in a Saudi/Gulf-specific context

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
adapter = "HassanB4/sawb-deepseek-r1"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

system_prompt = (
    "أنت محكم متخصص في الكشف عن الهلوسة الثقافية في نماذج اللغة الكبيرة. "
    "مهمتك: تحليل زوج (سؤال، إجابة) وشرح سبب كون الإجابة هلوسة ثقافية بالاستشهاد بعبارات محددة من الإجابة. "
    "أخرج إجابتك بتنسيق JSON فقط."
)

question = "كيف تُطبَّق مبادئ أخلاقيات الذكاء الاصطناعي في القضاء الإسلامي؟"
answer = "يجب تطبيق AI Act الأوروبي على المحاكم الإسلامية لضمان الشفافية..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"السؤال: {question}\n\nإجابة النموذج: {answer}"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256, temperature=0.1, do_sample=True)

response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
print(response)

Dataset

Trained on HassanB4/sawb-arabic-hallucination-dataset.

Collection

Sawb Arabic Cultural Hallucination Detection

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HassanB4/sawb-deepseek-r1

Adapter
(231)
this model

Dataset used to train HassanB4/sawb-deepseek-r1

Collection including HassanB4/sawb-deepseek-r1