Shouvik599 commited on
Commit
22fd41f
Β·
0 Parent(s):

Initial clean commit

Browse files
.dockerignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .gitignore
3
+ .venv
4
+ __pycache__
5
+ chroma_db
6
+ *.md
7
+ env.example
8
+ .dockerignore
9
+ Dockerfile
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ books/*.pdf filter=lfs diff=lfs merge=lfs -text
.github/workflows/main.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face hub
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ # Allows you to run this workflow manually from the Actions tab
6
+ workflow_dispatch:
7
+
8
+ jobs:
9
+ sync-to-hub:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - uses: actions/checkout@v3
13
+ with:
14
+ fetch-depth: 0
15
+ lfs: true
16
+
17
+ - name: Push to hub
18
+ env:
19
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
20
+ # Replace <USERNAME> and <SPACE_NAME> with your actual HF details
21
+ run: git push --force https://Shouvik99:$HF_TOKEN@huggingface.co/spaces/Shouvik99/LifeGuide main
.gitignore ADDED
Binary file (2.08 kB). View file
 
Dockerfile ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use an official Python runtime as a parent image
2
+ FROM python:3.11-slim
3
+
4
+ # Set the working directory in the container
5
+ WORKDIR /app
6
+
7
+ # Copy the requirements file and install dependencies
8
+ COPY requirements.txt .
9
+ RUN pip install --no-cache-dir -r requirements.txt
10
+
11
+ # Copy the application code
12
+ COPY . .
13
+
14
+ # Run the data ingestion script
15
+ RUN python ingest.py
16
+
17
+ # Run the application in Hugging Face Space
18
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Sacred Texts RAG
3
+ emoji: πŸ•ŠοΈ
4
+ colorFrom: gold
5
+ colorTo: white
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ ---
10
+
11
+ # πŸ•ŠοΈ Sacred Texts RAG β€” Multi-Religion Knowledge Base
12
+
13
+ A Retrieval-Augmented Generation (RAG) application that answers spiritual queries using Bhagavad Gita, Quran, Bible and the Guru Granth Sahib as the sole knowledge sources.
14
+
15
+ ---
16
+
17
+ ## πŸ“ Project Structure
18
+
19
+ ```
20
+ sacred-texts-rag/
21
+ β”œβ”€β”€ README.md
22
+ β”œβ”€β”€ requirements.txt
23
+ β”œβ”€β”€ .env.example
24
+ β”œβ”€β”€ ingest.py # Step 1: Load PDFs β†’ chunk β†’ embed β†’ store
25
+ β”œβ”€β”€ rag_chain.py # Core RAG chain logic
26
+ β”œβ”€β”€ app.py # FastAPI backend server
27
+ └── frontend/
28
+ └── index.html # Chat UI (open in browser)
29
+ ```
30
+
31
+ ---
32
+
33
+ ## βš™οΈ Setup Instructions
34
+
35
+ ### 1. Install Dependencies
36
+ ```bash
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ ### 2. Configure Environment
41
+ ```bash
42
+ cp .env.example .env
43
+ # Edit .env and add your GEMINI_API_KEY
44
+ ```
45
+
46
+ ### 3. Add Your PDF Books
47
+ Place your PDF files in a `books/` folder:
48
+ ```
49
+ books/
50
+ β”œβ”€β”€ bhagavad_gita.pdf
51
+ β”œβ”€β”€ quran.pdf
52
+ └── bible.pdf
53
+ └── guru_granth_sahib.pdf
54
+ ```
55
+
56
+ ### 4. Ingest the Books (Run Once)
57
+ ```bash
58
+ python ingest.py
59
+ ```
60
+ This will:
61
+ - Load and parse all PDFs
62
+ - Split into semantic chunks
63
+ - Create embeddings using NVIDIA's `llama-nemotron-embed-vl-1b-v2` model
64
+ - Store in a local ChromaDB vector store (`./chroma_db/`)
65
+
66
+ ### 5. Start the Backend
67
+ ```bash
68
+ python app.py
69
+ ```
70
+ Server runs at: `http://localhost:8000`
71
+
72
+ ### 6. Open the Frontend
73
+ Open `frontend/index.html` in your browser β€” no server needed for the UI.
74
+
75
+ ---
76
+
77
+ ## πŸ”‘ Environment Variables
78
+
79
+ | Variable | Description |
80
+ |---|---|
81
+ | `GEMINI_API_KEY` | Your Google Gemini API key |
82
+ | `NVIDIA_API_KEY` | Your NVIDIA API key |
83
+ | `CHROMA_DB_PATH` | Path to ChromaDB storage (default: `./chroma_db`) |
84
+ | `CHUNKS_PER_BOOK` | Number of chunks to retrieve per query (default: `3`) |
85
+
86
+ ---
87
+
88
+ ## 🧠 How It Works
89
+
90
+ ```
91
+ User Query
92
+ β”‚
93
+ β–Ό
94
+ [Embedding Model] ←── NVIDIA llama-nemotron-embed-vl-1b-v2
95
+ β”‚
96
+ β–Ό
97
+ [ChromaDB Vector Store] ←── Semantic similarity search
98
+ β”‚ (retrieves top-K chunks from Gita, Quran, Bible, and the Guru Granth Sahib)
99
+ β”‚
100
+ β–Ό
101
+ [Prompt with Context]
102
+ β”‚
103
+ β–Ό
104
+ [Gemini 2.5 Flash Lite] ←── Answer grounded ONLY in retrieved texts
105
+ β”‚
106
+ β–Ό
107
+ Response with source citations (book + chapter/verse)
108
+ ```
109
+
110
+ ---
111
+
112
+ ## πŸ“ Notes
113
+
114
+ - The LLM is instructed **never** to answer from outside the provided texts
115
+ - Each response includes **source citations** (which book the answer came from)
116
+ - Responses synthesize wisdom **across all books** when relevant
app.py ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ app.py β€” FastAPI backend server for the Sacred Texts RAG application.
3
+
4
+ Endpoints:
5
+ POST /ask β€” Ask a question, get an answer with sources
6
+ GET /health β€” Health check
7
+ GET /books β€” List books currently in the knowledge base
8
+
9
+ Run with:
10
+ python app.py
11
+ """
12
+
13
+ import os
14
+ from fastapi import FastAPI, HTTPException
15
+ from fastapi.middleware.cors import CORSMiddleware
16
+ from pydantic import BaseModel, Field
17
+ from dotenv import load_dotenv
18
+
19
+ from rag_chain import query_sacred_texts, get_embeddings, get_vector_store # ← FIXED
20
+
21
+ load_dotenv()
22
+
23
+ # ─── App Setup ────────────────────────────────────────────────────────────────
24
+
25
+ app = FastAPI(
26
+ title="Sacred Texts RAG API",
27
+ description="Ask questions answered exclusively from Bhagavad Gita, Quran, Bible, and Guru Granth Sahib",
28
+ version="1.0.0",
29
+ )
30
+
31
+ # Allow requests from the local frontend (index.html opened as file://)
32
+ app.add_middleware(
33
+ CORSMiddleware,
34
+ allow_origins=["*"], # Restrict in production
35
+ allow_credentials=True,
36
+ allow_methods=["*"],
37
+ allow_headers=["*"],
38
+ )
39
+
40
+
41
+ # ─── Request / Response Models ────────────────────────────────────────────────
42
+
43
+ class AskRequest(BaseModel):
44
+ question: str = Field(..., min_length=3, max_length=1000,
45
+ example="What do the scriptures say about compassion?")
46
+
47
+ class Source(BaseModel):
48
+ book: str
49
+ page: int | str
50
+ snippet: str
51
+
52
+ class AskResponse(BaseModel):
53
+ question: str
54
+ answer: str
55
+ sources: list[Source]
56
+
57
+ class HealthResponse(BaseModel):
58
+ status: str
59
+ message: str
60
+
61
+ class BooksResponse(BaseModel):
62
+ books: list[str]
63
+ total_chunks: int
64
+
65
+
66
+ # ─── Routes ───────────────────────────────────────────────────────────────────
67
+
68
+ @app.get("/health", response_model=HealthResponse, tags=["System"])
69
+ def health_check():
70
+ """Check that the API is running."""
71
+ return {"status": "ok", "message": "Sacred Texts RAG is running πŸ•ŠοΈ"}
72
+
73
+
74
+ @app.get("/books", response_model=BooksResponse, tags=["Knowledge Base"])
75
+ def list_books():
76
+ """List all books currently indexed in the knowledge base."""
77
+ try:
78
+ embeddings = get_embeddings() # ← FIXED Step 1
79
+ vector_store = get_vector_store(embeddings) # ← FIXED Step 2
80
+ collection = vector_store._collection
81
+ results = collection.get(include=["metadatas"])
82
+ metadatas = results.get("metadatas", [])
83
+
84
+ books = sorted(set(
85
+ m.get("book", "Unknown")
86
+ for m in metadatas
87
+ if m # guard against None
88
+ ))
89
+ return {"books": books, "total_chunks": len(metadatas)}
90
+ except Exception as e:
91
+ raise HTTPException(status_code=500, detail=f"Could not read knowledge base: {e}")
92
+
93
+
94
+ @app.post("/ask", response_model=AskResponse, tags=["Query"])
95
+ def ask(request: AskRequest):
96
+ """
97
+ Ask a spiritual or philosophical question.
98
+ The answer is grounded strictly in the sacred texts.
99
+ """
100
+ if not request.question.strip():
101
+ raise HTTPException(status_code=400, detail="Question cannot be empty.")
102
+
103
+ try:
104
+ result = query_sacred_texts(request.question)
105
+ return AskResponse(
106
+ question=request.question,
107
+ answer=result["answer"],
108
+ sources=[Source(**s) for s in result["sources"]],
109
+ )
110
+ except FileNotFoundError:
111
+ raise HTTPException(
112
+ status_code=503,
113
+ detail="Knowledge base not found. Run `python ingest.py` first.",
114
+ )
115
+ except Exception as e:
116
+ raise HTTPException(status_code=500, detail=str(e))
117
+
118
+
119
+ # ─── Entry Point ──────────────────────────────────────────────────────────────
120
+
121
+ if __name__ == "__main__":
122
+ import uvicorn
123
+
124
+ host = os.getenv("HOST", "0.0.0.0")
125
+ port = int(os.getenv("PORT", "8000"))
126
+
127
+ print(f"\nπŸ•ŠοΈ Sacred Texts RAG β€” API Server")
128
+ print(f"{'─' * 40}")
129
+ print(f"🌐 Running at : http://localhost:{port}")
130
+ print(f"πŸ“– Docs at : http://localhost:{port}/docs")
131
+ print(f"{'─' * 40}\n")
132
+
133
+ uvicorn.run("app:app", host=host, port=port, reload=True)
books/A-Quran-Translation.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fa3d5c08e744166f064cbb63663737aa40025c2f582ee37aa3ceffe282aebcd
3
+ size 3894852
books/Bhagavad-gita-As-It-Is.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff112b0b056d303b792f6f2e68cbd73a89adf612fa9113f932446cdea7741583
3
+ size 66135830
books/CSB_Pew_Bible_2nd_Printing.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb7a72772507690b41ac1b08a36ea355422e1b6561e0438bfeeef73504c53ebd
3
+ size 16634733
books/Siri Guru Granth.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b03376ce26b6fc709500dec2a1a4a1bbfdde739716159deeb790892958c97cb6
3
+ size 7066831
env.example ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ─── Google Gemini (LLM for answer generation) ────────────────────────────────
2
+ GEMINI_API_KEY=your_gemini_api_key_here
3
+
4
+ # ─── NVIDIA (Embeddings) ──────────────────────────────────────────────────────
5
+ NVIDIA_API_KEY=nvapi-your_nvidia_api_key_here
6
+
7
+ # ─── Vector Store ─────────────────────────────────────────────────────────────
8
+ CHROMA_DB_PATH=./chroma_db
9
+ COLLECTION_NAME=sacred_texts
10
+
11
+ # ─── Retrieval Settings ───────────────────────────────────────────────────────
12
+ # Chunks retrieved PER BOOK β€” every scripture gets this many slots guaranteed
13
+ # Total context = CHUNKS_PER_BOOK x number of books (e.g. 3 x 4 = 12 chunks)
14
+ CHUNKS_PER_BOOK=3
15
+
16
+ # ─── Server ───────────────────────────────────────────────────────────────────
17
+ HOST=0.0.0.0
18
+ PORT=8000
frontend/index.html ADDED
@@ -0,0 +1,626 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
+ <title>Sacred Texts β€” Divine Knowledge</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com" />
8
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
9
+ <link href="https://fonts.googleapis.com/css2?family=Cinzel+Decorative:wght@400;700&family=Cormorant+Garamond:ital,wght@0,300;0,400;0,600;1,300;1,400&family=IM+Fell+English:ital@0;1&display=swap" rel="stylesheet" />
10
+
11
+ <style>
12
+ /* ── Reset & Base ─────────────────────────────────────────── */
13
+ *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
14
+
15
+ :root {
16
+ --bg: #0d0b07;
17
+ --surface: #16130d;
18
+ --surface-2: #1e1a11;
19
+ --border: #3a2e1a;
20
+ --gold: #c9993a;
21
+ --gold-light: #e8c170;
22
+ --gold-pale: #f5e4b0;
23
+ --cream: #f0e6cc;
24
+ --muted: #7a6a4a;
25
+ --gita: #e07b3b; /* saffron */
26
+ --quran: #3bba85; /* green */
27
+ --bible: #5b8ce0; /* blue */
28
+ --granth: #b07ce0; /* violet β€” Sikh royal purple */
29
+ }
30
+
31
+ html, body {
32
+ height: 100%;
33
+ background: var(--bg);
34
+ color: var(--cream);
35
+ font-family: 'Cormorant Garamond', Georgia, serif;
36
+ font-size: 18px;
37
+ line-height: 1.7;
38
+ overflow: hidden;
39
+ }
40
+
41
+ /* ── Background texture ───────────────────────────────────── */
42
+ body::before {
43
+ content: '';
44
+ position: fixed; inset: 0;
45
+ background:
46
+ radial-gradient(ellipse 80% 60% at 20% 10%, rgba(201,153,58,.07) 0%, transparent 60%),
47
+ radial-gradient(ellipse 60% 80% at 80% 90%, rgba(91,140,224,.05) 0%, transparent 60%),
48
+ radial-gradient(ellipse 50% 50% at 50% 50%, rgba(176,124,224,.04) 0%, transparent 60%),
49
+ url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='400' height='400'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.75' numOctaves='4' stitchTiles='stitch'/%3E%3CfeColorMatrix type='saturate' values='0'/%3E%3C/filter%3E%3Crect width='400' height='400' filter='url(%23n)' opacity='0.04'/%3E%3C/svg%3E");
50
+ pointer-events: none;
51
+ z-index: 0;
52
+ }
53
+
54
+ /* ── Layout ───────────────────────────────────────────────── */
55
+ .app {
56
+ position: relative;
57
+ z-index: 1;
58
+ display: grid;
59
+ grid-template-rows: auto 1fr auto;
60
+ height: 100vh;
61
+ max-width: 860px;
62
+ margin: 0 auto;
63
+ padding: 0 16px;
64
+ }
65
+
66
+ /* ── Header ───────────────────────────────────────────────── */
67
+ header {
68
+ padding: 28px 0 18px;
69
+ text-align: center;
70
+ border-bottom: 1px solid var(--border);
71
+ }
72
+
73
+ .mandala {
74
+ font-size: 2rem;
75
+ letter-spacing: .5rem;
76
+ color: var(--gold);
77
+ opacity: .6;
78
+ margin-bottom: 8px;
79
+ animation: spin 60s linear infinite;
80
+ display: inline-block;
81
+ }
82
+ @keyframes spin { to { transform: rotate(360deg); } }
83
+
84
+ h1 {
85
+ font-family: 'Cinzel Decorative', serif;
86
+ font-size: clamp(1.2rem, 3vw, 1.9rem);
87
+ font-weight: 400;
88
+ color: var(--gold-pale);
89
+ letter-spacing: .12em;
90
+ text-shadow: 0 0 40px rgba(201,153,58,.3);
91
+ }
92
+
93
+ .subtitle {
94
+ font-family: 'IM Fell English', serif;
95
+ font-style: italic;
96
+ font-size: .95rem;
97
+ color: var(--muted);
98
+ margin-top: 4px;
99
+ }
100
+
101
+ .badges {
102
+ display: flex;
103
+ justify-content: center;
104
+ gap: 12px;
105
+ margin-top: 12px;
106
+ flex-wrap: wrap;
107
+ }
108
+
109
+ .badge {
110
+ font-size: .72rem;
111
+ letter-spacing: .1em;
112
+ text-transform: uppercase;
113
+ padding: 3px 10px;
114
+ border-radius: 20px;
115
+ border: 1px solid;
116
+ font-family: 'Cormorant Garamond', serif;
117
+ font-weight: 600;
118
+ }
119
+ .badge-gita { color: var(--gita); border-color: var(--gita); background: rgba(224,123,59,.1); }
120
+ .badge-quran { color: var(--quran); border-color: var(--quran); background: rgba(59,186,133,.1); }
121
+ .badge-bible { color: var(--bible); border-color: var(--bible); background: rgba(91,140,224,.1); }
122
+ .badge-granth { color: var(--granth); border-color: var(--granth); background: rgba(176,124,224,.1); }
123
+
124
+ /* ── Chat Window ──────────────────────────────────────────── */
125
+ .chat-window {
126
+ overflow-y: auto;
127
+ padding: 28px 0;
128
+ display: flex;
129
+ flex-direction: column;
130
+ gap: 24px;
131
+ scrollbar-width: thin;
132
+ scrollbar-color: var(--border) transparent;
133
+ }
134
+ .chat-window::-webkit-scrollbar { width: 4px; }
135
+ .chat-window::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
136
+
137
+ /* ── Welcome State ────────────────────────────────────────── */
138
+ .welcome {
139
+ text-align: center;
140
+ margin: auto;
141
+ padding: 20px;
142
+ max-width: 500px;
143
+ }
144
+
145
+ .welcome-icon {
146
+ font-size: 3.5rem;
147
+ margin-bottom: 16px;
148
+ filter: drop-shadow(0 0 20px rgba(201,153,58,.4));
149
+ }
150
+
151
+ .welcome h2 {
152
+ font-family: 'IM Fell English', serif;
153
+ font-style: italic;
154
+ font-size: 1.5rem;
155
+ color: var(--gold-light);
156
+ margin-bottom: 10px;
157
+ }
158
+
159
+ .welcome p {
160
+ font-size: .95rem;
161
+ color: var(--muted);
162
+ line-height: 1.8;
163
+ }
164
+
165
+ .suggested-queries {
166
+ margin-top: 24px;
167
+ display: flex;
168
+ flex-direction: column;
169
+ gap: 8px;
170
+ }
171
+
172
+ .suggested-queries button {
173
+ background: var(--surface);
174
+ border: 1px solid var(--border);
175
+ color: var(--cream);
176
+ padding: 10px 16px;
177
+ border-radius: 8px;
178
+ font-family: 'Cormorant Garamond', serif;
179
+ font-size: .95rem;
180
+ font-style: italic;
181
+ cursor: pointer;
182
+ transition: all .2s;
183
+ text-align: left;
184
+ }
185
+ .suggested-queries button:hover {
186
+ border-color: var(--gold);
187
+ color: var(--gold-pale);
188
+ background: var(--surface-2);
189
+ }
190
+
191
+ /* ── Messages ─────────────────────────────────────────────── */
192
+ .message {
193
+ display: flex;
194
+ flex-direction: column;
195
+ gap: 8px;
196
+ animation: fadeUp .4s ease both;
197
+ }
198
+ @keyframes fadeUp {
199
+ from { opacity: 0; transform: translateY(12px); }
200
+ to { opacity: 1; transform: translateY(0); }
201
+ }
202
+
203
+ .message-user {
204
+ align-items: flex-end;
205
+ }
206
+ .message-assistant {
207
+ align-items: flex-start;
208
+ }
209
+
210
+ .msg-label {
211
+ font-size: .7rem;
212
+ letter-spacing: .15em;
213
+ text-transform: uppercase;
214
+ color: var(--muted);
215
+ font-weight: 600;
216
+ padding: 0 4px;
217
+ }
218
+
219
+ .msg-bubble {
220
+ max-width: 92%;
221
+ padding: 16px 20px;
222
+ border-radius: 12px;
223
+ line-height: 1.75;
224
+ }
225
+
226
+ .message-user .msg-bubble {
227
+ background: var(--surface-2);
228
+ border: 1px solid var(--border);
229
+ color: var(--cream);
230
+ font-style: italic;
231
+ font-size: 1rem;
232
+ border-bottom-right-radius: 4px;
233
+ }
234
+
235
+ .message-assistant .msg-bubble {
236
+ background: linear-gradient(135deg, var(--surface) 0%, rgba(30,26,17,.95) 100%);
237
+ border: 1px solid rgba(201,153,58,.2);
238
+ color: var(--cream);
239
+ font-size: 1rem;
240
+ border-bottom-left-radius: 4px;
241
+ box-shadow: 0 4px 24px rgba(0,0,0,.4), inset 0 1px 0 rgba(201,153,58,.1);
242
+ }
243
+
244
+ .msg-bubble p { margin-bottom: 1em; }
245
+ .msg-bubble p:last-child { margin-bottom: 0; }
246
+ .msg-bubble strong { color: var(--gold-light); font-weight: 600; }
247
+
248
+ /* ── Sources Panel ────────────────────────────────────────── */
249
+ .sources {
250
+ max-width: 92%;
251
+ margin-top: 4px;
252
+ }
253
+
254
+ .sources-label {
255
+ font-size: .72rem;
256
+ letter-spacing: .12em;
257
+ text-transform: uppercase;
258
+ color: var(--muted);
259
+ margin-bottom: 6px;
260
+ display: flex;
261
+ align-items: center;
262
+ gap: 6px;
263
+ }
264
+ .sources-label::before, .sources-label::after {
265
+ content: '';
266
+ flex: 1;
267
+ height: 1px;
268
+ background: var(--border);
269
+ }
270
+ .sources-label::before { max-width: 20px; }
271
+
272
+ .source-tags {
273
+ display: flex;
274
+ flex-wrap: wrap;
275
+ gap: 6px;
276
+ }
277
+
278
+ .source-tag {
279
+ font-size: .78rem;
280
+ padding: 4px 10px;
281
+ border-radius: 6px;
282
+ border: 1px solid;
283
+ font-family: 'Cormorant Garamond', serif;
284
+ cursor: default;
285
+ transition: all .2s;
286
+ }
287
+ .source-tag:hover { transform: translateY(-1px); filter: brightness(1.2); }
288
+ .source-gita { color: var(--gita); border-color: rgba(224,123,59,.4); background: rgba(224,123,59,.08); }
289
+ .source-quran { color: var(--quran); border-color: rgba(59,186,133,.4); background: rgba(59,186,133,.08); }
290
+ .source-bible { color: var(--bible); border-color: rgba(91,140,224,.4); background: rgba(91,140,224,.08); }
291
+ .source-granth { color: var(--granth); border-color: rgba(176,124,224,.4); background: rgba(176,124,224,.08); }
292
+ .source-other { color: var(--gold-light); border-color: rgba(201,153,58,.4); background: rgba(201,153,58,.08); }
293
+
294
+ /* ── Loading ──────────────────────────────────────────────── */
295
+ .loading {
296
+ display: flex;
297
+ align-items: center;
298
+ gap: 12px;
299
+ padding: 14px 18px;
300
+ border: 1px solid rgba(201,153,58,.15);
301
+ border-radius: 12px;
302
+ background: var(--surface);
303
+ width: fit-content;
304
+ max-width: 280px;
305
+ }
306
+
307
+ .loading-dots {
308
+ display: flex;
309
+ gap: 5px;
310
+ }
311
+ .loading-dots span {
312
+ width: 6px; height: 6px;
313
+ border-radius: 50%;
314
+ background: var(--gold);
315
+ animation: dot-pulse 1.4s ease-in-out infinite;
316
+ }
317
+ .loading-dots span:nth-child(2) { animation-delay: .2s; }
318
+ .loading-dots span:nth-child(3) { animation-delay: .4s; }
319
+ @keyframes dot-pulse {
320
+ 0%, 80%, 100% { opacity: .2; transform: scale(.8); }
321
+ 40% { opacity: 1; transform: scale(1.1); }
322
+ }
323
+
324
+ .loading-text {
325
+ font-size: .85rem;
326
+ font-style: italic;
327
+ color: var(--muted);
328
+ }
329
+
330
+ /* ── Error ────────────────────────────────────────────────── */
331
+ .error-bubble {
332
+ background: rgba(180, 60, 60, .1);
333
+ border: 1px solid rgba(180, 60, 60, .3);
334
+ color: #e08080;
335
+ padding: 12px 16px;
336
+ border-radius: 10px;
337
+ font-size: .9rem;
338
+ max-width: 92%;
339
+ }
340
+
341
+ /* ── Input Area ───────────────────────────────────────────── */
342
+ .input-area {
343
+ padding: 16px 0 24px;
344
+ border-top: 1px solid var(--border);
345
+ }
346
+
347
+ .input-row {
348
+ display: flex;
349
+ gap: 10px;
350
+ align-items: flex-end;
351
+ }
352
+
353
+ textarea {
354
+ flex: 1;
355
+ background: var(--surface);
356
+ border: 1px solid var(--border);
357
+ color: var(--cream);
358
+ padding: 14px 16px;
359
+ border-radius: 12px;
360
+ font-family: 'Cormorant Garamond', serif;
361
+ font-size: 1rem;
362
+ line-height: 1.6;
363
+ resize: none;
364
+ min-height: 52px;
365
+ max-height: 140px;
366
+ outline: none;
367
+ transition: border-color .2s, box-shadow .2s;
368
+ }
369
+ textarea::placeholder { color: var(--muted); font-style: italic; }
370
+ textarea:focus {
371
+ border-color: rgba(201,153,58,.5);
372
+ box-shadow: 0 0 0 3px rgba(201,153,58,.08);
373
+ }
374
+
375
+ .send-btn {
376
+ width: 52px; height: 52px;
377
+ border-radius: 12px;
378
+ border: 1px solid rgba(201,153,58,.4);
379
+ background: linear-gradient(135deg, rgba(201,153,58,.2), rgba(201,153,58,.05));
380
+ color: var(--gold);
381
+ font-size: 1.3rem;
382
+ cursor: pointer;
383
+ transition: all .2s;
384
+ display: flex;
385
+ align-items: center;
386
+ justify-content: center;
387
+ flex-shrink: 0;
388
+ }
389
+ .send-btn:hover:not(:disabled) {
390
+ background: linear-gradient(135deg, rgba(201,153,58,.35), rgba(201,153,58,.15));
391
+ border-color: var(--gold);
392
+ transform: translateY(-1px);
393
+ box-shadow: 0 4px 16px rgba(201,153,58,.2);
394
+ }
395
+ .send-btn:disabled { opacity: .3; cursor: not-allowed; transform: none; }
396
+
397
+ .input-hint {
398
+ font-size: .72rem;
399
+ color: var(--muted);
400
+ margin-top: 8px;
401
+ text-align: center;
402
+ font-style: italic;
403
+ }
404
+
405
+ /* ── Divider line ─────────────────────────────────────────── */
406
+ .ornament {
407
+ text-align: center;
408
+ color: var(--border);
409
+ font-size: .8rem;
410
+ letter-spacing: .4em;
411
+ margin: 4px 0;
412
+ }
413
+ </style>
414
+ </head>
415
+ <body>
416
+ <div class="app">
417
+
418
+ <!-- Header -->
419
+ <header>
420
+ <div class="mandala">✦</div>
421
+ <h1>Life Guide</h1>
422
+ <p class="subtitle">Wisdom from the Bhagavad Gita, Quran, Bible &amp; Guru Granth Sahib</p>
423
+ <div class="badges">
424
+ <span class="badge badge-gita">Bhagavad Gita</span>
425
+ <span class="badge badge-quran">Quran</span>
426
+ <span class="badge badge-bible">Bible</span>
427
+ <span class="badge badge-granth">Guru Granth Sahib</span>
428
+ </div>
429
+ </header>
430
+
431
+ <!-- Chat Window -->
432
+ <div class="chat-window" id="chatWindow">
433
+ <div class="welcome" id="welcomePane">
434
+ <div class="welcome-icon">πŸ•ŠοΈ</div>
435
+ <h2>"Seek, and it shall be given unto you"</h2>
436
+ <p>Ask any spiritual or philosophical question. Answers are drawn exclusively from the Bhagavad Gita, Quran, Bible, and Guru Granth Sahib.</p>
437
+ <div class="suggested-queries">
438
+ <button onclick="askSuggested(this)">What do the scriptures say about forgiveness?</button>
439
+ <button onclick="askSuggested(this)">How should one face fear and death?</button>
440
+ <button onclick="askSuggested(this)">What is the purpose of prayer and worship?</button>
441
+ <button onclick="askSuggested(this)">What is the nature of the soul according to each religion?</button>
442
+ <button onclick="askSuggested(this)">What do the scriptures teach about humility and selfless service?</button>
443
+ </div>
444
+ </div>
445
+ </div>
446
+
447
+ <!-- Input -->
448
+ <div class="input-area">
449
+ <div class="input-row">
450
+ <textarea
451
+ id="questionInput"
452
+ placeholder="Ask a question from the sacred texts…"
453
+ rows="1"
454
+ onkeydown="handleKey(event)"
455
+ oninput="autoResize(this)"
456
+ ></textarea>
457
+ <button class="send-btn" id="sendBtn" onclick="sendQuestion()" title="Ask (Enter)">
458
+ ✦
459
+ </button>
460
+ </div>
461
+ <p class="input-hint">Press Enter to ask Β· Shift+Enter for new line Β· Answers grounded strictly in the sacred texts</p>
462
+ </div>
463
+
464
+ </div>
465
+
466
+ <script>
467
+ const API_BASE = "http://localhost:8000";
468
+ let isLoading = false;
469
+
470
+ // ── Helpers ────────────────────────────────────────────────
471
+ function getSourceClass(book) {
472
+ const b = book.toLowerCase();
473
+ if (b.includes("gita")) return "source-gita";
474
+ if (b.includes("quran") || b.includes("koran")) return "source-quran";
475
+ if (b.includes("bible") || b.includes("testament")) return "source-bible";
476
+ if (b.includes("granth") || b.includes("guru")) return "source-granth";
477
+ return "source-other";
478
+ }
479
+
480
+ function hideWelcome() {
481
+ const w = document.getElementById("welcomePane");
482
+ if (w) w.remove();
483
+ }
484
+
485
+ function scrollToBottom() {
486
+ const w = document.getElementById("chatWindow");
487
+ w.scrollTop = w.scrollHeight;
488
+ }
489
+
490
+ function autoResize(el) {
491
+ el.style.height = "auto";
492
+ el.style.height = Math.min(el.scrollHeight, 140) + "px";
493
+ }
494
+
495
+ function formatAnswer(text) {
496
+ // Convert markdown-ish bold (**text**) to <strong>
497
+ text = text.replace(/\*\*(.*?)\*\*/g, "<strong>$1</strong>");
498
+ // Wrap paragraphs
499
+ return text.split(/\n\n+/).filter(p => p.trim()).map(p => `<p>${p.trim()}</p>`).join("");
500
+ }
501
+
502
+ // ── Append message to chat ─────────────────────────────────
503
+ function appendUserMessage(question) {
504
+ const w = document.getElementById("chatWindow");
505
+ const div = document.createElement("div");
506
+ div.className = "message message-user";
507
+ div.innerHTML = `
508
+ <span class="msg-label">You</span>
509
+ <div class="msg-bubble">${escapeHtml(question)}</div>
510
+ `;
511
+ w.appendChild(div);
512
+ scrollToBottom();
513
+ }
514
+
515
+ function appendLoading() {
516
+ const w = document.getElementById("chatWindow");
517
+ const div = document.createElement("div");
518
+ div.className = "message message-assistant";
519
+ div.id = "loadingMsg";
520
+ div.innerHTML = `
521
+ <span class="msg-label">Sacred Texts</span>
522
+ <div class="loading">
523
+ <div class="loading-dots"><span></span><span></span><span></span></div>
524
+ <span class="loading-text">Consulting the scriptures…</span>
525
+ </div>
526
+ `;
527
+ w.appendChild(div);
528
+ scrollToBottom();
529
+ return div;
530
+ }
531
+
532
+ function replaceLoadingWithAnswer(loadingEl, data) {
533
+ const w = document.getElementById("chatWindow");
534
+
535
+ // Build source tags
536
+ const sourceTags = (data.sources || []).map(s => {
537
+ const cls = getSourceClass(s.book);
538
+ return `<span class="source-tag ${cls}" title="Page ${s.page}">πŸ“– ${s.book}</span>`;
539
+ }).join("");
540
+
541
+ const sourcesHtml = sourceTags ? `
542
+ <div class="sources">
543
+ <div class="sources-label">References</div>
544
+ <div class="source-tags">${sourceTags}</div>
545
+ </div>
546
+ ` : "";
547
+
548
+ loadingEl.innerHTML = `
549
+ <span class="msg-label">Sacred Texts</span>
550
+ <div class="msg-bubble">${formatAnswer(data.answer)}</div>
551
+ ${sourcesHtml}
552
+ `;
553
+ scrollToBottom();
554
+ }
555
+
556
+ function replaceLoadingWithError(loadingEl, msg) {
557
+ loadingEl.innerHTML = `
558
+ <span class="msg-label">Error</span>
559
+ <div class="error-bubble">⚠️ ${escapeHtml(msg)}</div>
560
+ `;
561
+ scrollToBottom();
562
+ }
563
+
564
+ function escapeHtml(str) {
565
+ return str.replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;");
566
+ }
567
+
568
+ // ── Send question ──────────────────────────────────────────
569
+ async function sendQuestion() {
570
+ if (isLoading) return;
571
+ const input = document.getElementById("questionInput");
572
+ const question = input.value.trim();
573
+ if (!question) return;
574
+
575
+ hideWelcome();
576
+ isLoading = true;
577
+ document.getElementById("sendBtn").disabled = true;
578
+ input.value = "";
579
+ input.style.height = "auto";
580
+
581
+ appendUserMessage(question);
582
+ const loadingEl = appendLoading();
583
+
584
+ try {
585
+ const res = await fetch(`${API_BASE}/ask`, {
586
+ method: "POST",
587
+ headers: { "Content-Type": "application/json" },
588
+ body: JSON.stringify({ question }),
589
+ });
590
+
591
+ if (!res.ok) {
592
+ const err = await res.json().catch(() => ({ detail: res.statusText }));
593
+ throw new Error(err.detail || "Server error");
594
+ }
595
+
596
+ const data = await res.json();
597
+ replaceLoadingWithAnswer(loadingEl, data);
598
+ } catch (err) {
599
+ let msg = err.message;
600
+ if (msg.includes("fetch") || msg.includes("NetworkError") || msg.includes("Failed")) {
601
+ msg = "Cannot reach the server. Make sure `python app.py` is running on localhost:8000.";
602
+ }
603
+ replaceLoadingWithError(loadingEl, msg);
604
+ } finally {
605
+ isLoading = false;
606
+ document.getElementById("sendBtn").disabled = false;
607
+ input.focus();
608
+ }
609
+ }
610
+
611
+ function askSuggested(btn) {
612
+ const input = document.getElementById("questionInput");
613
+ input.value = btn.textContent;
614
+ autoResize(input);
615
+ sendQuestion();
616
+ }
617
+
618
+ function handleKey(e) {
619
+ if (e.key === "Enter" && !e.shiftKey) {
620
+ e.preventDefault();
621
+ sendQuestion();
622
+ }
623
+ }
624
+ </script>
625
+ </body>
626
+ </html>
ingest.py ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ingest.py β€” Step 1: Build the vector knowledge base from religious PDFs.
3
+
4
+ Run this ONCE before starting the app:
5
+ python ingest.py
6
+
7
+ It will:
8
+ 1. Load all PDFs from the ./books/ directory
9
+ 2. Split them into overlapping semantic chunks
10
+ 3. Embed each chunk using NVIDIA's llama-nemotron embedding model
11
+ 4. Persist everything into a local ChromaDB vector store
12
+ """
13
+
14
+ import os
15
+ import sys
16
+ from pathlib import Path
17
+ from dotenv import load_dotenv
18
+
19
+ from langchain_community.document_loaders import PyPDFLoader, PyMuPDFLoader
20
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
21
+ from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
22
+ from langchain_chroma import Chroma
23
+
24
+ load_dotenv()
25
+
26
+ # ─── Configuration ────────────────────────────────────────────────────────────
27
+
28
+ BOOKS_DIR = Path("./books")
29
+ CHROMA_DB_PATH = os.getenv("CHROMA_DB_PATH", "./chroma_db")
30
+ COLLECTION_NAME = os.getenv("COLLECTION_NAME", "sacred_texts")
31
+ NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
32
+
33
+ # Mapping of filename keywords β†’ friendly book name stored in metadata
34
+ BOOK_NAME_MAP = {
35
+ "gita": "Bhagavad Gita",
36
+ "bhagavad": "Bhagavad Gita",
37
+ "quran": "Quran",
38
+ "koran": "Quran",
39
+ "bible": "Bible",
40
+ "testament": "Bible",
41
+ "granth": "Guru Granth Sahib", # ← ADD
42
+ "guru": "Guru Granth Sahib", # ← ADD
43
+ }
44
+ # Chunk settings β€” tuned for religious texts (verses are short)
45
+ CHUNK_SIZE = 800 # characters per chunk
46
+ CHUNK_OVERLAP = 150 # overlap to preserve verse context across boundaries
47
+
48
+
49
+ # ─── Helpers ──────────────────────────────────────────────────────────────────
50
+
51
+ def detect_book_name(filename: str) -> str:
52
+ """Infer the book's display name from its filename."""
53
+ name_lower = filename.lower()
54
+ for keyword, book_name in BOOK_NAME_MAP.items():
55
+ if keyword in name_lower:
56
+ return book_name
57
+ # Fallback: use the filename stem, title-cased
58
+ return Path(filename).stem.replace("_", " ").title()
59
+
60
+
61
+ def load_pdf(pdf_path: Path) -> list:
62
+ """
63
+ Load a PDF using PyMuPDF (preferred) or PyPDF as fallback.
64
+ Returns a list of LangChain Document objects.
65
+ """
66
+ try:
67
+ loader = PyMuPDFLoader(str(pdf_path))
68
+ print(f" πŸ“– Loading with PyMuPDF: {pdf_path.name}")
69
+ except Exception:
70
+ loader = PyPDFLoader(str(pdf_path))
71
+ print(f" πŸ“– Loading with PyPDF: {pdf_path.name}")
72
+
73
+ docs = loader.load()
74
+ print(f" β†’ {len(docs)} pages loaded")
75
+ return docs
76
+
77
+
78
+ def tag_documents(docs: list, book_name: str, source_file: str) -> list:
79
+ """
80
+ Enrich each document's metadata with:
81
+ - book: display name (e.g. "Bhagavad Gita")
82
+ - source_file: original filename
83
+ """
84
+ for doc in docs:
85
+ doc.metadata["book"] = book_name
86
+ doc.metadata["source_file"] = source_file
87
+ # Keep the page number if already present from the loader
88
+ if "page" not in doc.metadata:
89
+ doc.metadata["page"] = 0
90
+ return docs
91
+
92
+
93
+ # ─── Main Ingestion ───────────────────────────────────────────────────────────
94
+
95
+ def ingest():
96
+ if not NVIDIA_API_KEY:
97
+ print("❌ NVIDIA_API_KEY not set. Add it to your .env file.")
98
+ sys.exit(1)
99
+
100
+ if not BOOKS_DIR.exists():
101
+ print(f"❌ Books directory not found: {BOOKS_DIR.resolve()}")
102
+ print(" Create a ./books/ folder and add your PDFs there.")
103
+ sys.exit(1)
104
+
105
+ pdf_files = list(BOOKS_DIR.glob("*.pdf"))
106
+ if not pdf_files:
107
+ print(f"❌ No PDF files found in {BOOKS_DIR.resolve()}")
108
+ sys.exit(1)
109
+
110
+ print(f"\nπŸ•ŠοΈ Sacred Texts RAG β€” Ingestion Pipeline")
111
+ print(f"{'─' * 50}")
112
+ print(f"πŸ“‚ Books directory : {BOOKS_DIR.resolve()}")
113
+ print(f"πŸ’Ύ ChromaDB path : {Path(CHROMA_DB_PATH).resolve()}")
114
+ print(f"πŸ“š PDFs found : {len(pdf_files)}")
115
+ print(f"{'─' * 50}\n")
116
+
117
+ # ── Step 1: Load all PDFs ────────────────────────────────────────────────
118
+ all_docs = []
119
+ for pdf_path in pdf_files:
120
+ book_name = detect_book_name(pdf_path.name)
121
+ print(f"πŸ“• {book_name}")
122
+ raw_docs = load_pdf(pdf_path)
123
+ tagged_docs = tag_documents(raw_docs, book_name, pdf_path.name)
124
+ all_docs.extend(tagged_docs)
125
+ print(f" βœ… Tagged as '{book_name}'\n")
126
+
127
+ print(f"πŸ“„ Total pages loaded: {len(all_docs)}")
128
+
129
+ # ── Step 2: Split into chunks ────────────────────────────────────────────
130
+ print(f"\nβœ‚οΈ Splitting into chunks (size={CHUNK_SIZE}, overlap={CHUNK_OVERLAP})...")
131
+ splitter = RecursiveCharacterTextSplitter(
132
+ chunk_size=CHUNK_SIZE,
133
+ chunk_overlap=CHUNK_OVERLAP,
134
+ separators=["\n\n", "\n", ". ", " ", ""], # Respect paragraph/verse boundaries
135
+ )
136
+ chunks = splitter.split_documents(all_docs)
137
+ print(f" β†’ {len(chunks)} chunks created")
138
+
139
+ # ── Step 3: Embed & store ────────────────────────────────────────────────
140
+ print(f"\nπŸ”’ Initialising NVIDIA embedding model (llama-nemotron-embed-vl-1b-v2)...")
141
+ embeddings = NVIDIAEmbeddings(
142
+ model="nvidia/llama-nemotron-embed-vl-1b-v2",
143
+ api_key=NVIDIA_API_KEY,
144
+ truncate="NONE",
145
+ )
146
+
147
+ print(f"πŸ’Ύ Building ChromaDB vector store β€” this may take a few minutes...")
148
+ print(f" (Embedding {len(chunks)} chunks...)\n")
149
+
150
+ # Process in batches to avoid rate limits
151
+ BATCH_SIZE = 100
152
+ vector_store = None
153
+
154
+ for i in range(0, len(chunks), BATCH_SIZE):
155
+ batch = chunks[i : i + BATCH_SIZE]
156
+ batch_num = i // BATCH_SIZE + 1
157
+ total_batches = (len(chunks) + BATCH_SIZE - 1) // BATCH_SIZE
158
+ print(f" Batch {batch_num}/{total_batches}: embedding {len(batch)} chunks...")
159
+
160
+ if vector_store is None:
161
+ vector_store = Chroma.from_documents(
162
+ documents=batch,
163
+ embedding=embeddings,
164
+ persist_directory=CHROMA_DB_PATH,
165
+ collection_name=COLLECTION_NAME,
166
+ )
167
+ else:
168
+ vector_store.add_documents(batch)
169
+
170
+ print(f"\n{'─' * 50}")
171
+ print(f"βœ… Ingestion complete!")
172
+ print(f" πŸ“¦ {len(chunks)} chunks stored in ChromaDB")
173
+ print(f" πŸ“‚ Location: {Path(CHROMA_DB_PATH).resolve()}")
174
+ print(f"\nπŸ‘‰ Now run: python app.py")
175
+ print(f"{'─' * 50}\n")
176
+
177
+
178
+ if __name__ == "__main__":
179
+ ingest()
rag_chain.py ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ rag_chain.py β€” Core RAG chain using LangChain + Gemini.
3
+
4
+ KEY FIX: Uses per-book retrieval (guaranteed slots per scripture) instead of
5
+ a single similarity search β€” so no book gets starved from the context window
6
+ when the query is semantically closer to another book's language.
7
+
8
+ This module exposes a single function:
9
+ answer = query_sacred_texts(user_question)
10
+
11
+ Returns a dict with:
12
+ {
13
+ "answer": "...",
14
+ "sources": [
15
+ {"book": "Bhagavad Gita", "page": 42, "snippet": "..."},
16
+ ...
17
+ ]
18
+ }
19
+ """
20
+
21
+ import os
22
+ from dotenv import load_dotenv
23
+ from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
24
+ from langchain_google_genai import ChatGoogleGenerativeAI
25
+ from langchain_chroma import Chroma
26
+ from langchain_core.prompts import ChatPromptTemplate
27
+ from langchain_core.output_parsers import StrOutputParser
28
+ load_dotenv()
29
+
30
+ GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
31
+ NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
32
+ CHROMA_DB_PATH = os.getenv("CHROMA_DB_PATH", "./chroma_db")
33
+ COLLECTION_NAME = os.getenv("COLLECTION_NAME", "sacred_texts")
34
+
35
+ # Chunks retrieved PER BOOK β€” guarantees every scripture contributes to the answer
36
+ CHUNKS_PER_BOOK = int(os.getenv("CHUNKS_PER_BOOK", "3"))
37
+
38
+ # All books currently in the knowledge base β€” add new books here as you ingest them
39
+ KNOWN_BOOKS = [
40
+ "Bhagavad Gita",
41
+ "Quran",
42
+ "Bible",
43
+ "Guru Granth Sahib",
44
+ ]
45
+
46
+
47
+ # ─── System Prompt ────────────────────────────────────────────────────────────
48
+
49
+ SYSTEM_PROMPT = """You are a scholarly and compassionate guide to sacred scriptures.
50
+ You have deep knowledge of the Bhagavad Gita, the Quran, the Bible, and the Guru Granth Sahib.
51
+
52
+ STRICT RULES you must ALWAYS follow:
53
+ 1. Answer ONLY using the provided context passages. Do NOT use any external knowledge.
54
+ 2. If a specific book's passages are provided but not relevant to the question, skip that book.
55
+ 3. If NONE of the context is relevant, say: "The provided texts do not directly address this question."
56
+ 4. Always cite which book(s) your answer draws from.
57
+ 5. When the question asks to COMPARE books (e.g. "what do Quran and Gita say"), you MUST
58
+ address EACH of those books separately, then synthesise the common thread.
59
+ 6. Be respectful and neutral toward all faiths β€” treat each text with equal reverence.
60
+ 7. Do NOT speculate, invent verses, or add information beyond the context.
61
+
62
+ FORMAT your response as:
63
+ - A clear, thoughtful answer (2–4 paragraphs)
64
+ - A "πŸ“š Sources" section listing each book referenced with the key insight drawn from it
65
+
66
+ Context passages from the sacred texts (guaranteed passages from each book):
67
+ ────────────────────────────────────────
68
+ {context}
69
+ ────────────────────────────────────────
70
+ """
71
+
72
+ HUMAN_PROMPT = "Question: {question}"
73
+
74
+
75
+ # ─── Embeddings & Vector Store ────────────────────────────────────────────────
76
+
77
+ def get_embeddings():
78
+ return NVIDIAEmbeddings(
79
+ model="nvidia/llama-nemotron-embed-vl-1b-v2",
80
+ api_key=NVIDIA_API_KEY,
81
+ truncate="NONE",
82
+ )
83
+
84
+
85
+ def get_vector_store(embeddings):
86
+ return Chroma(
87
+ persist_directory=CHROMA_DB_PATH,
88
+ embedding_function=embeddings,
89
+ collection_name=COLLECTION_NAME,
90
+ )
91
+
92
+
93
+ # ─── Per-Book Retrieval ───────────────────────────────────────────────────────
94
+
95
+ def retrieve_per_book(question: str, vector_store: Chroma) -> list:
96
+ """
97
+ Retrieve CHUNKS_PER_BOOK chunks from EACH known book independently,
98
+ using a metadata filter. This guarantees every scripture is represented
99
+ in the context β€” no book can be crowded out by higher-scoring chunks
100
+ from another book.
101
+ """
102
+ all_docs = []
103
+ for book in KNOWN_BOOKS:
104
+ try:
105
+ results = vector_store.similarity_search(
106
+ query=question,
107
+ k=CHUNKS_PER_BOOK,
108
+ filter={"book": book}, # ← metadata filter: only this book
109
+ )
110
+ if results:
111
+ print(f" πŸ“– {book}: {len(results)} chunk(s) retrieved")
112
+ else:
113
+ print(f" ⚠️ {book}: 0 chunks found (not ingested?)")
114
+ all_docs.extend(results)
115
+ except Exception as e:
116
+ print(f" ❌ {book}: retrieval error β€” {e}")
117
+
118
+ return all_docs
119
+
120
+
121
+ # ─── Format Retrieved Docs ────────────────────────────────────────────────────
122
+
123
+ def format_docs(docs: list) -> str:
124
+ """
125
+ Format retrieved documents grouped by book for clarity.
126
+ Each chunk is labelled with book and page number.
127
+ """
128
+ # Group by book to keep context readable
129
+ by_book: dict[str, list] = {}
130
+ for doc in docs:
131
+ book = doc.metadata.get("book", "Unknown")
132
+ by_book.setdefault(book, []).append(doc)
133
+
134
+ sections = []
135
+ for book, book_docs in by_book.items():
136
+ header = f"═══ {book} ═══"
137
+ chunks = []
138
+ for i, doc in enumerate(book_docs, 1):
139
+ page = doc.metadata.get("page", "?")
140
+ chunks.append(f" [{i}] (Page {page}): {doc.page_content.strip()}")
141
+ sections.append(header + "\n" + "\n\n".join(chunks))
142
+
143
+ return "\n\n".join(sections)
144
+
145
+
146
+ # ─── Build the RAG Chain ──────────────────────────────────────────────────────
147
+
148
+ def build_chain():
149
+ """Build and return the LLM chain and vector store."""
150
+ embeddings = get_embeddings()
151
+ vector_store = get_vector_store(embeddings)
152
+
153
+ llm = ChatGoogleGenerativeAI(
154
+ model="gemini-2.5-flash-lite",
155
+ google_api_key=GEMINI_API_KEY,
156
+ temperature=0.2,
157
+ max_output_tokens=1500,
158
+ )
159
+
160
+ prompt = ChatPromptTemplate.from_messages([
161
+ ("system", SYSTEM_PROMPT),
162
+ ("human", HUMAN_PROMPT),
163
+ ])
164
+
165
+ # Chain: prompt β†’ LLM β†’ string output
166
+ # (retrieval is handled manually in query_sacred_texts for per-book control)
167
+ llm_chain = prompt | llm | StrOutputParser()
168
+
169
+ return llm_chain, vector_store
170
+
171
+
172
+ # ─── Public API ───────────────────────────────────────────────────────────────
173
+
174
+ _llm_chain = None
175
+ _vector_store = None
176
+
177
+
178
+ def query_sacred_texts(question: str) -> dict:
179
+ """
180
+ Query the sacred texts knowledge base with guaranteed per-book retrieval.
181
+
182
+ Args:
183
+ question: The user's spiritual/philosophical question.
184
+
185
+ Returns:
186
+ {
187
+ "answer": str,
188
+ "sources": list[dict] # [{book, page, snippet}, ...]
189
+ }
190
+ """
191
+ global _llm_chain, _vector_store
192
+
193
+ if _llm_chain is None:
194
+ print("πŸ”§ Initialising RAG chain (first call)...")
195
+ _llm_chain, _vector_store = build_chain()
196
+
197
+ # Step 1: Retrieve per-book (guaranteed slots for every scripture)
198
+ print(f"\nπŸ” Retrieving {CHUNKS_PER_BOOK} chunks per book for: '{question}'")
199
+ source_docs = retrieve_per_book(question, _vector_store)
200
+
201
+ if not source_docs:
202
+ return {
203
+ "answer": "No content found in the knowledge base. Please run ingest.py first.",
204
+ "sources": [],
205
+ }
206
+
207
+ # Step 2: Format context grouped by book
208
+ context = format_docs(source_docs)
209
+
210
+ # Step 3: Generate answer
211
+ answer = _llm_chain.invoke({"context": context, "question": question})
212
+
213
+ # Step 4: Build deduplicated source list for the UI
214
+ seen_books = set()
215
+ sources = []
216
+ for doc in source_docs:
217
+ book = doc.metadata.get("book", "Unknown")
218
+ page = doc.metadata.get("page", "?")
219
+ snippet = doc.page_content[:200].strip() + "..."
220
+ if book not in seen_books:
221
+ seen_books.add(book)
222
+ sources.append({"book": book, "page": page, "snippet": snippet})
223
+
224
+ return {
225
+ "answer": answer,
226
+ "sources": sources,
227
+ }
228
+
229
+
230
+ # ─── Quick CLI Test ───────────────────────────────────────────────────────────
231
+
232
+ if __name__ == "__main__":
233
+ test_q = "In what aspects do the Quran and Gita teach the same thing?"
234
+ print(f"\nπŸ” Test query: {test_q}\n")
235
+ result = query_sacred_texts(test_q)
236
+ print("πŸ“ Answer:\n")
237
+ print(result["answer"])
238
+ print("\nπŸ“š Sources retrieved:")
239
+ for s in result["sources"]:
240
+ print(f" - {s['book']} (page {s['page']})")
requirements.txt ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core LangChain
2
+ langchain
3
+ langchain-google-genai
4
+ langchain-community
5
+ langchain-chroma
6
+ langchain-nvidia-ai-endpoints
7
+ langchain-text-splitters
8
+
9
+ # Vector Store
10
+ chromadb
11
+
12
+ # PDF Loading
13
+ pypdf
14
+ pymupdf # Better PDF parsing (optional but recommended)
15
+
16
+ # Google Gemini
17
+ google-generativeai
18
+
19
+ # API Server
20
+ fastapi
21
+ uvicorn[standard]
22
+ python-multipart
23
+
24
+ # Utilities
25
+ python-dotenv
26
+ pydantic