convAI / README.md
sinhapiyush86's picture
Upload 15 files
afad319 verified
metadata
title: RAG System with PDF Documents
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
app_port: 8501

πŸ€– Conversational AI RAG System

A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models.

πŸš€ Features

  • Hybrid Search: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results
  • Advanced Guard Rails: Comprehensive safety and security measures
  • Multiple Models: Support for Qwen 2.5 1.5B and distilgpt2 fallback
  • PDF Processing: Intelligent document chunking and processing
  • Real-time Monitoring: Performance metrics and system health checks
  • Docker Support: Containerized deployment with Docker Compose
  • Hugging Face Spaces Ready: Optimized for HF Spaces deployment

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Streamlit UI  │───▢│   RAG System    │───▢│  Guard Rails    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PDF Processor  β”‚    β”‚   FAISS Index   β”‚    β”‚  Language Model β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technology Stack

Core Technologies

  • πŸ” Vector Database: FAISS for efficient similarity search
  • πŸ“ Sparse Retrieval: BM25 for keyword-based search
  • 🧠 Embedding Model: all-MiniLM-L6-v2 for document embeddings
  • πŸ€– Generative Model: Qwen 2.5 1.5B for answer generation
  • 🌐 UI Framework: Streamlit for interactive interface
  • 🐳 Containerization: Docker for deployment

Supporting Libraries

  • πŸ“Š Data Processing: Pandas, NumPy for data manipulation
  • πŸ“„ PDF Handling: PyPDF for document processing
  • πŸ”§ ML Utilities: Scikit-learn for preprocessing
  • πŸ“ Logging: Loguru for structured logging
  • ⚑ Optimization: Accelerate for model optimization

πŸš€ Quick Start

Local Development

  1. Clone and Setup:
git clone <repository-url>
cd convAI
pip install -r requirements.txt
  1. Run the Application:
streamlit run app.py
  1. Upload PDFs and Start Chatting!

Docker Deployment

  1. Build and Run:
docker-compose up --build
  1. Access at: http://localhost:8501

🌟 Hugging Face Spaces Deployment

This application is optimized for deployment on Hugging Face Spaces. The system automatically:

  • Uses /tmp directories for cache storage (writable in HF Spaces)
  • Configures environment variables for HF Spaces compatibility
  • Handles permission issues automatically
  • Optimizes model loading for HF Spaces environment

HF Spaces Configuration

The application includes:

  • Cache Management: All model caches stored in /tmp directories
  • Permission Handling: Automatic fallback to writable directories
  • Environment Detection: Adapts to HF Spaces runtime environment
  • Resource Optimization: Efficient memory and CPU usage

Deploy to HF Spaces

  1. Create a new Space on Hugging Face
  2. Choose Docker as the SDK
  3. Upload all files from this repository
  4. The system will automatically:
    • Set up cache directories in /tmp
    • Download and cache models
    • Initialize the RAG system with guard rails
    • Start the Streamlit interface

HF Spaces Environment Variables

The system automatically configures:

HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub

πŸ“– Usage Guide

Document Upload

  • Automatic Loading: PDF documents in the container are loaded automatically
  • Manual Upload: Use the sidebar to upload additional PDF documents
  • Supported Formats: PDF files with text content

Search Methods

  • πŸ”€ Hybrid: Combines vector similarity and keyword matching (recommended)
  • 🎯 Dense: Uses only vector similarity search
  • πŸ“ Sparse: Uses only keyword-based BM25 search

Query Interface

  • Natural Language: Ask questions in plain English
  • Context Awareness: System uses retrieved documents for context
  • Confidence Scores: See how confident the system is in its answers
  • Source Citations: View which documents were used for the answer

βš™οΈ Configuration

Environment Variables

# Model Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATIVE_MODEL=Qwen/Qwen2.5-1.5B-Instruct

# Chunk Sizes
CHUNK_SIZES=100,400

# Vector Store Path
VECTOR_STORE_PATH=./vector_store

# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0

Performance Tuning

  • Chunk Sizes: Adjust for different document types (smaller for technical docs, larger for narratives)
  • Top-k Results: Increase for more comprehensive answers, decrease for faster responses
  • Model Selection: Choose between Qwen 2.5 1.5B and distilgpt2 based on performance needs

πŸ“Š Performance

Optimization Features

  • Parallel Processing: Documents are loaded concurrently for faster initialization
  • Optimized Search: Hybrid retrieval combines the best of vector and keyword search
  • Memory Efficient: Uses CPU-optimized models for deployment compatibility
  • Caching: FAISS index and metadata are cached for faster subsequent queries

Expected Performance

  • Document Loading: ~2-5 seconds per PDF (depending on size)
  • Query Response: ~1-3 seconds for typical questions
  • Memory Usage: ~2-4GB RAM for typical document collections
  • Storage: ~100MB per 1000 document chunks

πŸ”§ Development

Project Structure

convAI/
β”œβ”€β”€ app.py                 # Main Streamlit application
β”œβ”€β”€ rag_system.py          # Core RAG system implementation
β”œβ”€β”€ pdf_processor.py       # PDF processing utilities
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ Dockerfile            # Container configuration
β”œβ”€β”€ docker-compose.yml    # Multi-container setup
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ DEPLOYMENT_GUIDE.md   # Detailed deployment instructions
β”œβ”€β”€ test_deployment.py    # Deployment testing script
β”œβ”€β”€ test_docker.py        # Docker testing script
└── src/
    └── streamlit_app.py  # Sample Streamlit app

Testing

# Test deployment readiness
python test_deployment.py

# Test Docker configuration
python test_docker.py

# Run local tests
streamlit run app.py

πŸ› Troubleshooting

Common Issues

  1. Model Loading Errors

    • Check internet connectivity for model downloads
    • Verify sufficient disk space
    • Try the fallback model (distilgpt2)
  2. Memory Issues

    • Reduce chunk sizes
    • Use smaller embedding models
    • Limit the number of documents
  3. Performance Issues

    • Adjust top-k parameter
    • Use sparse search for keyword-heavy queries
    • Consider hardware upgrades
  4. Docker Issues

    • Check Docker installation
    • Verify port availability
    • Check container logs

Getting Help

  • Check the logs in your Space's "Logs" tab
  • Review the deployment guide for common solutions
  • Create an issue in the project repository

🀝 Contributing

We welcome contributions! Please see our contributing guidelines for:

  • Code style and standards
  • Testing requirements
  • Documentation updates
  • Feature requests and bug reports

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Hugging Face for providing the platform and models
  • FAISS team for the efficient vector search library
  • Streamlit team for the excellent web framework
  • OpenAI for inspiring the RAG architecture

Built with ❀️ for efficient document question-answering

Ready to explore your documents? Start asking questions! πŸš€