Spaces:
Running
Running
File size: 2,931 Bytes
9b2b303 a892aab 9b2b303 6e1da14 602d30e 6e1da14 602d30e 6e1da14 1c69645 6e1da14 1c69645 6e1da14 1c69645 6e1da14 1c69645 6e1da14 1c69645 6e1da14 1c69645 6e1da14 1c69645 6e1da14 1c69645 6e1da14 4e933f3 6e1da14 9e4e9dc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
license: mit
sdk: gradio
emoji: π
colorFrom: gray
sdk_version: 5.34.0
---
# π₯ Hybrid Search RAGtim Bot
A sophisticated hybrid search system combining semantic vector search with BM25 keyword matching for optimal information retrieval.
## π Features
- **Hybrid Search**: Combines transformer-based semantic similarity with BM25 keyword ranking
- **Multi-Modal Search**: Vector search, BM25 search, and intelligent fusion
- **Real-time API**: RESTful endpoints for integration
- **Interactive UI**: Three interfaces - Chat, Advanced Search, and Statistics
- **Knowledge Base**: Comprehensive markdown-based knowledge system
## π§ Technology Stack
- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 (384-dim)
- **Search**: Custom BM25 implementation + Vector similarity
- **Framework**: Gradio 4.44.0
- **ML**: Transformers, PyTorch, NumPy
- **Deployment**: Hugging Face Spaces
## π Knowledge Base Structure
The system processes markdown files from the `knowledge_base/` directory:
- `about.md` - Personal information and professional summary
- `research_details.md` - Research projects and methodologies
- `publications_detailed.md` - Publications with technical details
- `skills_expertise.md` - Technical skills and expertise
- `experience_detailed.md` - Professional experience
- `statistics.md` - Statistical methods and biostatistics
## π Search Methods
### Hybrid Search (Recommended)
Combines semantic and keyword search with configurable weights:
- Default: 60% vector + 40% BM25
- Optimal for most queries
- Balances meaning and exact term matching
### Vector Search
Pure semantic similarity using transformer embeddings:
- Best for conceptual questions
- Finds semantically related content
- Language-agnostic similarity
### BM25 Search
Traditional keyword-based ranking:
- Excellent for specific terms
- TF-IDF with document length normalization
- Fast and interpretable
## π οΈ API Endpoints
### Search API
GET /api/stats
## π Configuration
Key parameters in `config.py`:
- `BM25_K1 = 1.5` - Term frequency saturation
- `BM25_B = 0.75` - Document length normalization
- `DEFAULT_VECTOR_WEIGHT = 0.6` - Hybrid search weighting
- `DEFAULT_BM25_WEIGHT = 0.4` - Hybrid search weighting
## π Deployment
1. Clone to Hugging Face Spaces
2. Ensure all markdown files are in `knowledge_base/`
3. The system auto-initializes on startup
4. Access via the provided Space URL
## π‘ Usage Examples
**Chat Interface:**
- "What is Raktim's LLM research?"
- "Tell me about statistical methods"
- "Describe multimodal AI capabilities"
**Advanced Search:**
- Adjust vector/BM25 weights
- Compare search methods
- Fine-tune result count
**API Integration:**
```python
import requests
response = requests.get(
"https://your-space.hf.space/api/search",
params={
"query": "machine learning research",
"top_k": 5,
"search_type": "hybrid"
}
)
``` |