🙏🏽 Lessons from Building a Sikh Scripture AI: Retrieval Outperforms Fine-Tuning for Sri Guru Granth Sahib Exegesis

Shanvir Dhinsa · EqualizeAI · May 2026

🔬 What Is This?

Granth Expert is a four-layer AI system for computational exegesis of the Sri Guru Granth Sahib Ji — the 1,430-page central scripture of Sikhism, composed in Gurmukhi script across Punjabi, Sanskrit, Persian, Hindi, and Braj Bhasha.

This paper documents the complete engineering journey: what worked, what didn't, and why.

https://huggingface.co/datasets/ShanvirDhinsa/sggs-bench

📊 Headline Result

LoRA Baseline (v1)

64.8 ± 1.8 / 100

Pure fine-tuning, no retrieval

➡️

v1 + RAG + Prompt Fix

76.6 ± 0.7 / 100

✅ +11.8 composite uplift

For low-resource scriptural domains, retrieval engineering is the dominant lever — not parameter-efficient fine-tuning.

🏗️ Four Contributions

📚 1. SGGS-Exegesis-1430 Corpus

The first machine-readable trilingual scholarly annotation of the complete Sri Guru Granth Sahib Ji:

1,430 Angs × 61,985 verses
Gurmukhi text + English translation + Punjabi steek
Per-Ang scholarly summaries, themes, mood, and reflection questions
Grounded in Dr. Sant Singh Khalsa's translation & Prof. Sahib Singh's Guru Granth Darpan

📏 2. SGGS-Bench v0.1

An 8-task, 115-question evaluation framework:

Factual recall · Verse retrieval · Scholarly exegesis
Life-struggle guidance · Hallucination resistance
Theological depth · Cross-reference synthesis · Safety
Hybrid scoring: automated + LLM-judge with 5-dim rubric

❌ 3. Negative Result: LoRA Hits a Ceiling

Four LoRA fine-tuning attempts. Zero beat the baseline.

Run	Δ vs v1	What Happened
v2	-11.7	All dimensions degraded
v3	-4.4	Safety +37, but Exegesis -36
v4	-1.4	Hallucination collapsed -27

Every retraining traded away previously-strong dimensions — the alignment tax reproduced four times.

✅ 4. Positive Result: RAG Delivers

Three retrieval-layer techniques with deterministic, zero-cost gains:

Technique	Dimension	Δ
🔤 Vocabulary bridge	Retrieval	+7.5
🗺️ Ang cross-ref map	CrossRef	+6.5
📋 Fact card injection	Factual	+36.5
🛡️ Hardened prompt	Safety	+28.5

📊 Full Per-Dimension Results (n=2 mean)

Dimension	Weight	Pure v1	v1 + RAG + Fix	Δ	Source of Gain
🎯 Factual	15%	56.5	93.0	🟢 +36.5	SGGS_FACTS card
🔍 Retrieval	15%	80.0	87.5	🟢 +7.5	TOPIC_ANCHOR expansion
📖 Exegesis	15%	78.2	71.0	🔴 -7.2	Prompt trade-off (§8)
🧭 Guidance	20%	59.2	65.2	🟢 +6.0	Combined effect
🛑 Hallucination	10%	75.5	78.7	⚪ +3.2	Within noise
🕉️ Theology	5%	75.9	79.0	🟢 +3.1	Rule-7/13 fix
🔗 CrossRef	5%	48.6	55.1	🟢 +6.5	ANG_CROSS_REFS map
🛡️ Safety	15%	46.5	75.0	🟢 +28.5	Hardened system prompt
📈 Composite	100%	64.8 ± 1.8	76.6 ± 0.7	🟢 +11.8	Full layered system

🏛️ System Architecture

┌────────────────────────────────────────────────────────────┐
│  Layer 4: System prompt                                    │
│    13-rule hardened prompt (post-surgical-fix)             │
├────────────────────────────────────────────────────────────┤
│  Layer 3: RAG context construction                         │
│    • Vocabulary expansion (modern → Gurbani)               │
│    • Explicit-Ang routing                                  │
│    • Topic-anchor routing (life-struggle keywords)         │
│    • Cross-reference anchor routing (ANG_CROSS_REFS)       │
│    • Structural fact card injection (SGGS_FACTS)           │
│    • Per-verse relevance filtering                         │
├────────────────────────────────────────────────────────────┤
│  Layer 2: LoRA adapter (v1, frozen)                        │
│    rank 8, 16 layers, lr 1e-5, 1500 iters                  │
├────────────────────────────────────────────────────────────┤
│  Layer 1: Base model                                       │
│    Qwen3-14B (Alibaba, 4-bit quantized via MLX)            │
│    Apache 2.0 license; ~7.4 GB on disk                     │
└────────────────────────────────────────────────────────────┘

Key insight: Layer 2 is frozen. All gains since v1 are achieved in Layers 3 and 4 — retrieval and prompting. The lever lives in retrieval, not in model weights.

Runtime: Fully offline on Apple Silicon · ~15s model load · ~30–60s per query · ~12 GB memory

💡 Bonus Finding: Prompt Engineering Pitfall

Two specific rule wordings in our 13-rule hardened system prompt induced early-EOS termination on five scholarly questions, costing ~25 composite points until surgically reworded. The diagnostic methodology — write a reproducer before guessing fixes — generalizes beyond this domain.

📋 Corpus & Benchmark Details

📚 SGGS-Exegesis-1430 Corpus Statistics

Metric	Value
Total Angs	1,430
Total verses	61,985
Training examples	9,839 (8,400 train + 1,439 validation)
Source languages	Gurmukhi, Punjabi, English
Unique authors	35 (6 Gurus, 15 Bhagats, 11 Bhatts, 3 others)
Raag count	31 standard scholarly raags
Sources	GurbaniNow API, Dr. Sant Singh Khalsa (1996), Prof. Sahib Singh Guru Granth Darpan (1962–64)

📏 SGGS-Bench v0.1 Task Breakdown

Task	Questions	Weight	Scoring Method
Factual	20	15%	Automated (substring match)
Retrieval	20	15%	Automated (Ang-number match)
Exegesis	10	15%	LLM-judge (5-dim rubric)
Guidance	15	20%	Hybrid (anchor-Ang + rubric)
Hallucination	15	10%	Automated (T/F + content)
Theology	10	5%	LLM-judge (5-dim rubric)
Cross-reference	15	5%	Hybrid (Ang set + rubric)
Safety	10	15%	Automated (refusal/crisis)

📈 All 10 Bench Runs (Chronological)

#	Configuration	Composite
1	v1 baseline	66.6
2	v2 LoRA	54.8
3	v1 + Hardened Prompt	62.8
4	v3 LoRA	62.2
5	v4 LoRA	65.2
6	v1 + RAG (run 1)	73.3
7	v1 + RAG (run 2)	72.5
8	v1 + RAG + Fix	77.3
9	Pure v1 (variance check)	63.0
10	v1 + RAG + Fix (variance)	75.9

🙏🏽 The Honest Scope

We do not claim this system is a substitute for traditional Sikh learning, Gurmat, or guidance from a Giani or Granthi. The Sri Guru Granth Sahib Ji is a living Guru in Sikh practice; any computational tool is at best a study companion.

We claim only that the system identifies relevant passages and explains them in a way that is, on our benchmark, closer to scholarly than to general-purpose — and that the architecture lessons generalize.

📝 Citation

@article{dhinsa2026granth,
  title   = {Lessons from Building a Sikh Scripture {AI}:
             Retrieval Outperforms Fine-Tuning for
             {Sri Guru Granth Sahib} Exegesis},
  author  = {Dhinsa, Shanvir},
  year    = {2026},
  note    = {EqualizeAI}
}

Keywords: Religious NLP · Low-Resource Scripture · RAG · LoRA · Negative Results · Sri Guru Granth Sahib · Edge Deployment

ਵਾਹਿਗੁਰੂ ਜੀ ਕਾ ਖ਼ਾਲਸਾ ਵਾਹਿਗੁਰੂ ਜੀ ਕੀ ਫ਼ਤਹਿ

Waheguru Ji Ka Khalsa, Waheguru Ji Ki Fateh

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support