Spaces:

abhivsh
/

ModelTS_SearchEngine

Runtime error

App Files Files Community

ModelTS_SearchEngine / README.md

abhivsh

Upload 2 files

086f690 verified 4 days ago

preview code

raw

history blame contribute delete

2.48 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: EnggSS RAG ChatBot
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
license: other

EnggSS RAG ChatBot

Serving-only HuggingFace Space — reads a pre-built private dataset, no PDF processing at runtime. Build the dataset locally with preprocessing/create_dataset.py, then deploy this Space to answer questions.

How it works

Local machine (once)
  PDFs  →  create_dataset.py  →  BAAI/bge-large-en-v1.5 embeddings
                                        │
                                        ▼
                            Private HuggingFace Dataset
                                        │
                  ┌─────────────────────┘
                  ▼  (Space startup)
         Load dataset → NumPy float32 matrix (L2-normalised)
                  │
                  ▼  (each query, ~20 ms)
         Embed query → cosine scores → MMR top-3
                  │
                  ▼
         Qwen2.5-7B-Instruct (HF Inference API) → answer
                  │
                  ▼
              Gradio UI

Tabs

Tab	Purpose
💬 Q&A	Ask questions; see top-3 retrieved contexts + generated answer
📊 Analytics	Total chunks, documents processed, per-file breakdown

Required Space Secrets

Set in Settings → Variables and Secrets:

Secret	Description
`HF_TOKEN`	HuggingFace token — needs read access to the dataset repo
`HF_DATASET_REPO`	e.g. `your-org/enggss-rag-dataset` (created by preprocessing script)

Setup order

Run preprocessing locally (once, or when you add new PDFs):

cd preprocessing
pip install -r requirements.txt
python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset

Deploy this Space — upload app.py + requirements.txt + README.md
Set the two secrets above in Space Settings → Secrets
Space restarts, loads the dataset, and is ready to answer questions

To add new PDFs later without rebuilding everything:

python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset --update

Local development

git clone https://huggingface.co/spaces/your-org/enggss-rag-chatbot
cd enggss-rag-chatbot
pip install -r requirements.txt
# create .env with HF_TOKEN and HF_DATASET_REPO
python app.py