Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: EnggSS RAG ChatBot
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
license: other
EnggSS RAG ChatBot
Serving-only HuggingFace Space — reads a pre-built private dataset, no PDF
processing at runtime. Build the dataset locally with
preprocessing/create_dataset.py, then deploy this Space to answer questions.
How it works
Local machine (once)
PDFs → create_dataset.py → BAAI/bge-large-en-v1.5 embeddings
│
▼
Private HuggingFace Dataset
│
┌─────────────────────┘
▼ (Space startup)
Load dataset → NumPy float32 matrix (L2-normalised)
│
▼ (each query, ~20 ms)
Embed query → cosine scores → MMR top-3
│
▼
Qwen2.5-7B-Instruct (HF Inference API) → answer
│
▼
Gradio UI
Tabs
| Tab | Purpose |
|---|---|
| 💬 Q&A | Ask questions; see top-3 retrieved contexts + generated answer |
| 📊 Analytics | Total chunks, documents processed, per-file breakdown |
Required Space Secrets
Set in Settings → Variables and Secrets:
| Secret | Description |
|---|---|
HF_TOKEN |
HuggingFace token — needs read access to the dataset repo |
HF_DATASET_REPO |
e.g. your-org/enggss-rag-dataset (created by preprocessing script) |
Setup order
- Run preprocessing locally (once, or when you add new PDFs):
cd preprocessing pip install -r requirements.txt python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset - Deploy this Space — upload
app.py+requirements.txt+README.md - Set the two secrets above in Space Settings → Secrets
- Space restarts, loads the dataset, and is ready to answer questions
To add new PDFs later without rebuilding everything:
python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset --update
Local development
git clone https://huggingface.co/spaces/your-org/enggss-rag-chatbot
cd enggss-rag-chatbot
pip install -r requirements.txt
# create .env with HF_TOKEN and HF_DATASET_REPO
python app.py