- ππ½ Lessons from Building a Sikh Scripture AI: Retrieval Outperforms Fine-Tuning for Sri Guru Granth Sahib Exegesis
ππ½ Lessons from Building a Sikh Scripture AI: Retrieval Outperforms Fine-Tuning for Sri Guru Granth Sahib Exegesis
Shanvir Dhinsa Β· EqualizeAI Β· May 2026
π¬ What Is This?
Granth Expert is a four-layer AI system for computational exegesis of the Sri Guru Granth Sahib Ji β the 1,430-page central scripture of Sikhism, composed in Gurmukhi script across Punjabi, Sanskrit, Persian, Hindi, and Braj Bhasha.
This paper documents the complete engineering journey: what worked, what didn't, and why.
https://huggingface.co/datasets/ShanvirDhinsa/sggs-bench
LoRA Baseline (v1)
Pure fine-tuning, no retrieval |
β‘οΈ |
v1 + RAG + Prompt Fix
β +11.8 composite uplift |
For low-resource scriptural domains, retrieval engineering is the dominant lever β not parameter-efficient fine-tuning.
ποΈ Four Contributions
π Full Per-Dimension Results (n=2 mean)
| Dimension | Weight | Pure v1 | v1 + RAG + Fix | Ξ | Source of Gain |
|---|---|---|---|---|---|
| π― Factual | 15% | 56.5 | 93.0 | π’ +36.5 | SGGS_FACTS card |
| π Retrieval | 15% | 80.0 | 87.5 | π’ +7.5 | TOPIC_ANCHOR expansion |
| π Exegesis | 15% | 78.2 | 71.0 | π΄ -7.2 | Prompt trade-off (Β§8) |
| π§ Guidance | 20% | 59.2 | 65.2 | π’ +6.0 | Combined effect |
| π Hallucination | 10% | 75.5 | 78.7 | βͺ +3.2 | Within noise |
| ποΈ Theology | 5% | 75.9 | 79.0 | π’ +3.1 | Rule-7/13 fix |
| π CrossRef | 5% | 48.6 | 55.1 | π’ +6.5 | ANG_CROSS_REFS map |
| π‘οΈ Safety | 15% | 46.5 | 75.0 | π’ +28.5 | Hardened system prompt |
| π Composite | 100% | 64.8 Β± 1.8 | 76.6 Β± 0.7 | π’ +11.8 | Full layered system |
ποΈ System Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 4: System prompt β
β 13-rule hardened prompt (post-surgical-fix) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: RAG context construction β
β β’ Vocabulary expansion (modern β Gurbani) β
β β’ Explicit-Ang routing β
β β’ Topic-anchor routing (life-struggle keywords) β
β β’ Cross-reference anchor routing (ANG_CROSS_REFS) β
β β’ Structural fact card injection (SGGS_FACTS) β
β β’ Per-verse relevance filtering β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: LoRA adapter (v1, frozen) β
β rank 8, 16 layers, lr 1e-5, 1500 iters β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 1: Base model β
β Qwen3-14B (Alibaba, 4-bit quantized via MLX) β
β Apache 2.0 license; ~7.4 GB on disk β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key insight: Layer 2 is frozen. All gains since v1 are achieved in Layers 3 and 4 β retrieval and prompting. The lever lives in retrieval, not in model weights.
Runtime: Fully offline on Apple Silicon Β· ~15s model load Β· ~30β60s per query Β· ~12 GB memory
π‘ Bonus Finding: Prompt Engineering Pitfall
Two specific rule wordings in our 13-rule hardened system prompt induced early-EOS termination on five scholarly questions, costing ~25 composite points until surgically reworded. The diagnostic methodology β write a reproducer before guessing fixes β generalizes beyond this domain.
π Corpus & Benchmark Details
π SGGS-Exegesis-1430 Corpus Statistics
| Metric | Value |
|---|---|
| Total Angs | 1,430 |
| Total verses | 61,985 |
| Training examples | 9,839 (8,400 train + 1,439 validation) |
| Source languages | Gurmukhi, Punjabi, English |
| Unique authors | 35 (6 Gurus, 15 Bhagats, 11 Bhatts, 3 others) |
| Raag count | 31 standard scholarly raags |
| Sources | GurbaniNow API, Dr. Sant Singh Khalsa (1996), Prof. Sahib Singh Guru Granth Darpan (1962β64) |
π SGGS-Bench v0.1 Task Breakdown
| Task | Questions | Weight | Scoring Method |
|---|---|---|---|
| Factual | 20 | 15% | Automated (substring match) |
| Retrieval | 20 | 15% | Automated (Ang-number match) |
| Exegesis | 10 | 15% | LLM-judge (5-dim rubric) |
| Guidance | 15 | 20% | Hybrid (anchor-Ang + rubric) |
| Hallucination | 15 | 10% | Automated (T/F + content) |
| Theology | 10 | 5% | LLM-judge (5-dim rubric) |
| Cross-reference | 15 | 5% | Hybrid (Ang set + rubric) |
| Safety | 10 | 15% | Automated (refusal/crisis) |
π All 10 Bench Runs (Chronological)
| # | Configuration | Composite |
|---|---|---|
| 1 | v1 baseline | 66.6 |
| 2 | v2 LoRA | 54.8 |
| 3 | v1 + Hardened Prompt | 62.8 |
| 4 | v3 LoRA | 62.2 |
| 5 | v4 LoRA | 65.2 |
| 6 | v1 + RAG (run 1) | 73.3 |
| 7 | v1 + RAG (run 2) | 72.5 |
| 8 | v1 + RAG + Fix | 77.3 |
| 9 | Pure v1 (variance check) | 63.0 |
| 10 | v1 + RAG + Fix (variance) | 75.9 |
ππ½ The Honest Scope
We do not claim this system is a substitute for traditional Sikh learning, Gurmat, or guidance from a Giani or Granthi. The Sri Guru Granth Sahib Ji is a living Guru in Sikh practice; any computational tool is at best a study companion.
We claim only that the system identifies relevant passages and explains them in a way that is, on our benchmark, closer to scholarly than to general-purpose β and that the architecture lessons generalize.
π Citation
@article{dhinsa2026granth,
title = {Lessons from Building a Sikh Scripture {AI}:
Retrieval Outperforms Fine-Tuning for
{Sri Guru Granth Sahib} Exegesis},
author = {Dhinsa, Shanvir},
year = {2026},
note = {EqualizeAI}
}
Keywords: Religious NLP Β· Low-Resource Scripture Β· RAG Β· LoRA Β· Negative Results Β· Sri Guru Granth Sahib Β· Edge Deployment
ਡਾਹਿΰ¨ΰ©ΰ¨°ΰ© ΰ¨ΰ© ΰ¨ΰ¨Ύ ΰ¨ΰ¨Όΰ¨Ύΰ¨²ΰ¨Έΰ¨Ύ ਡਾਹਿΰ¨ΰ©ΰ¨°ΰ© ΰ¨ΰ© ΰ¨ΰ© ਫ਼਀ਹਿ
Waheguru Ji Ka Khalsa, Waheguru Ji Ki Fateh