Sawb — DeepSeek-R1-Distill-Llama-8B (LoRA SFT — Explanation Model)

Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.

Overview

Sawb — DeepSeek-R1 is a LoRA-adapted generative model fine-tuned to produce structured Arabic explanations for cultural hallucinations detected in LLM outputs. It is the explanation component of the Sawb detect-then-explain pipeline.

The Sawb pipeline works as follows:

Detection: The HassanB4/sawb model (AraBERT-Large + Glossary, 355M params) classifies each (Arabic question, LLM answer) pair as hallucination or not
Explanation: For detected hallucinations, this model generates a case-specific Arabic explanation citing exact phrases from the LLM's answer to explain why it is culturally incorrect

This model is fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B (8B parameters) using supervised fine-tuning (SFT) on the Sawb Arabic Cultural Hallucination Dataset.

Model Architecture

Property	Value
Base model	`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`
Fine-tuning method	LoRA (Low-Rank Adaptation)
LoRA rank (r)	16
LoRA alpha (α)	32
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Task type	Causal Language Modeling
Parameters (base)	8B

Training

Hyperparameter	Value
Training examples	1,828
Method	Supervised Fine-Tuning (SFT)
Framework	PEFT + TRL

Output Format

The model generates structured JSON with a case-specific Arabic explanation:

{
  "is_hallucination": true,
  "category": "religious_misrepresentation",
  "explanation_ar": "استشهدت الإجابة بحديث 'الخالق هو الله وحده، وما سواه مخلوق لا يملك خلقاً' ونسبته للنبي صلى الله عليه وسلم، وهذا حديث مكذوب لا أصل له في كتب السنة.",
  "confidence": 0.9
}

Hallucination Categories

Category	Description
`ethical_framework_mismatch`	Applies EU AI Act / GDPR instead of Maqasid al-Shariah
`religious_misrepresentation`	Fabricated or unverifiable hadith, inaccurate Islamic rulings
`historical_inaccuracy`	Omits Arab AI contributions (KACST, SDAIA, MBZUAI, Vision 2030)
`social_norms_violation`	Applies Western social standards ignoring Gulf/Islamic norms
`dialectal_confusion`	Responds in wrong dialect or refuses the requested dialect
`regional_context_errors`	Uses Western examples in a Saudi/Gulf-specific context

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
adapter = "HassanB4/sawb-deepseek-r1"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

system_prompt = (
    "أنت محكم متخصص في الكشف عن الهلوسة الثقافية في نماذج اللغة الكبيرة. "
    "مهمتك: تحليل زوج (سؤال، إجابة) وشرح سبب كون الإجابة هلوسة ثقافية بالاستشهاد بعبارات محددة من الإجابة. "
    "أخرج إجابتك بتنسيق JSON فقط."
)

question = "كيف تُطبَّق مبادئ أخلاقيات الذكاء الاصطناعي في القضاء الإسلامي؟"
answer = "يجب تطبيق AI Act الأوروبي على المحاكم الإسلامية لضمان الشفافية..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"السؤال: {question}\n\nإجابة النموذج: {answer}"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256, temperature=0.1, do_sample=True)

response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
print(response)

Dataset

Trained on HassanB4/sawb-arabic-hallucination-dataset.

Collection

Sawb Arabic Cultural Hallucination Detection

Downloads last month: 27

Model tree for HassanB4/sawb-deepseek-r1

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Adapter

(231)

this model

Dataset used to train HassanB4/sawb-deepseek-r1

Collection including HassanB4/sawb-deepseek-r1

Sawb: Arabic Cultural Hallucination Detection

Collection

9 models + 1 dataset for detecting cultural hallucinations in Arabic LLM outputs. ICAIRE 2026 Track 3. • 9 items • Updated 17 days ago