Sawb — Qwen2.5-7B-Instruct (LoRA SFT — Research Baseline)

Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.

Overview

Sawb — Qwen2.5-7B is a LoRA adapter fine-tuned from Qwen/Qwen2.5-7B-Instruct (7B parameters) for Arabic cultural hallucination detection and explanation. This model is released as a research baseline evaluated during the Sawb system development.

It achieved a macro F1 of 0.5556 on the 457-example validation set — significantly below the Arabic BERT encoder models (F1 = 0.9246–0.9647) and below the DeepSeek-based pipeline. The underperformance is attributed to Qwen2.5-7B lacking the deep Arabic dialectal and Islamic cultural knowledge needed for the most challenging hallucination categories (dialectal confusion, religious misrepresentation).

For production use, see the primary detection model: HassanB4/sawb (AraBERT-Large + Glossary, F1=0.9246).

Model Architecture

Property	Value
Base model	`Qwen/Qwen2.5-7B-Instruct`
Fine-tuning method	LoRA (Low-Rank Adaptation)
LoRA rank (r)	8
LoRA alpha (α)	8
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Task type	Causal Language Modeling
Parameters (base)	7B

Training

Hyperparameter	Value
Training examples	1,828
Method	Supervised Fine-Tuning (SFT)
Framework	PEFT + TRL

Evaluation Results

Metric	Value
Macro F1 (validation)	0.5556
Task	Binary classification (hallucination / not)
Evaluation set	457 Arabic (question, LLM answer) pairs

Output Format

The model is trained to output structured JSON:

{
  "is_hallucination": true,
  "category": "dialectal_confusion",
  "explanation_ar": "النموذج أجاب بالفصحى بينما طُلب منه اللهجة النجدية، وهذا يمثّل ارتباكاً لهجياً واضحاً.",
  "confidence": 0.9
}

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "HassanB4/sawb-qwen25"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

question = "اشرح مفهوم النموذج اللغوي باللهجة النجدية"
answer = "النموذج اللغوي هو نظام يستخدم الذكاء الاصطناعي لفهم اللغة..."

messages = [
    {"role": "system", "content": "أنت محكم متخصص في الكشف عن الهلوسة الثقافية. أخرج JSON فقط."},
    {"role": "user", "content": f"السؤال: {question}\n\nإجابة النموذج: {answer}"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256, temperature=0.1, do_sample=True)

response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
print(response)