pdf-qa-chatbot / README.md
Amin23's picture
Initial commit
c573367
metadata
title: PDF QA Chatbot
emoji: πŸ“„πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: true

PDF-Based Q&A Chatbot System

A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.

Features

  • PDF Processing: Extract text and metadata from uploaded PDF documents
  • Vector Storage: Store document embeddings in ChromaDB for efficient retrieval
  • AI-Powered Q&A: Use OpenAI/Claude for intelligent question answering
  • Modern UI: Clean, responsive interface built with Next.js and Tailwind CSS
  • Real-time Chat: Interactive chat interface with conversation history
  • File Management: Upload, view, and manage multiple PDF documents
  • Context Awareness: Maintain conversation context and document references

Tech Stack

Backend

  • FastAPI: High-performance web framework
  • PyPDF2: PDF text extraction
  • ChromaDB: Vector database for embeddings
  • OpenAI/Claude: AI language models for Q&A
  • SQLAlchemy: Database ORM
  • Pydantic: Data validation

Frontend

  • Next.js 14: React framework with App Router
  • TypeScript: Type-safe development
  • Tailwind CSS: Utility-first styling
  • Shadcn/ui: Modern UI components
  • React Hook Form: Form handling
  • Zustand: State management

Project Structure

ChatbotCursor/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── utils/
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── main.py
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ lib/
β”‚   └── package.json
β”œβ”€β”€ docker-compose.yml
└── README.md

Quick Start

Option 1: Automated Setup (Recommended)

For Linux/macOS:

chmod +x setup.sh
./setup.sh

For Windows:

.\setup.ps1

Option 2: Manual Setup

  1. Clone and Setup

    cd ChatbotCursor
    
  2. Backend Setup

    cd backend
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    cp .env.example .env
    # Edit .env and add your API keys
    
  3. Frontend Setup

    cd frontend
    npm install
    cp .env.example .env
    
  4. Environment Variables

    • Edit backend/.env and add your API keys:
      • OPENAI_API_KEY or ANTHROPIC_API_KEY
    • The frontend .env should work with defaults
  5. Run the Application

    # Backend (Terminal 1)
    cd backend
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    uvicorn main:app --reload
    
    # Frontend (Terminal 2)
    cd frontend
    npm run dev
    

Option 3: Docker Setup

# Build and run with Docker Compose
docker-compose up --build

# Or run services individually
docker-compose up backend
docker-compose up frontend
  1. Access the Application

Usage

Getting Started

  1. Upload Documents

    • Navigate to the "Documents" tab
    • Drag and drop PDF files or click to select
    • Wait for processing (text extraction and vector embedding)
    • View upload status and document statistics
  2. Start Chatting

    • Switch to the "Chat" tab
    • Ask questions about your uploaded documents
    • Get AI-powered answers with source references
    • View conversation history
  3. Document Management

    • View all uploaded documents with metadata
    • Delete documents when no longer needed
    • Monitor processing status and file sizes

Features

  • Smart Document Processing: Automatic text extraction and chunking
  • Vector Search: Semantic similarity search for relevant content
  • AI-Powered Q&A: Context-aware answers using OpenAI or Claude
  • Source Citations: See which documents and sections were referenced
  • Conversation History: Persistent chat sessions
  • File Management: Upload, view, and delete documents
  • Real-time Processing: Live status updates during uploads

Supported File Types

  • PDF Documents: All standard PDF files
  • Maximum Size: 10MB per file
  • Processing: Automatic text extraction and metadata parsing

API Endpoints

Document Management

  • POST /api/v1/documents/upload: Upload PDF documents
  • GET /api/v1/documents/: List all documents
  • GET /api/v1/documents/{id}: Get specific document
  • DELETE /api/v1/documents/{id}: Delete a document
  • GET /api/v1/documents/stats/summary: Get document statistics

Chat & Q&A

  • POST /api/v1/chat/: Send questions and get answers
  • GET /api/v1/chat/history/{session_id}: Get chat history
  • POST /api/v1/chat/session/new: Create new chat session
  • GET /api/v1/chat/sessions: List all sessions
  • DELETE /api/v1/chat/session/{session_id}: Delete session
  • GET /api/v1/chat/models/available: Get available AI models

System

  • GET /health: Health check
  • GET /docs: Interactive API documentation (Swagger UI)
  • GET /redoc: Alternative API documentation

Configuration

Environment Variables

Backend (.env):

# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key

# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760

Frontend (.env):

NEXT_PUBLIC_API_URL=http://localhost:8000

AI Provider Setup

  1. OpenAI: Get API key from OpenAI Platform
  2. Anthropic: Get API key from Anthropic Console

Development

Backend Development

cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000

Frontend Development

cd frontend
npm run dev

Testing

# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test

Troubleshooting

Common Issues

  1. API Key Not Configured

    • Ensure you've added your API key to backend/.env
    • Restart the backend server after changing environment variables
  2. Upload Fails

    • Check file size (max 10MB)
    • Ensure file is a valid PDF
    • Check backend logs for detailed error messages
  3. Chat Not Working

    • Verify AI service is configured and working
    • Check if documents are properly processed
    • Review browser console for frontend errors
  4. Docker Issues

    • Ensure Docker and Docker Compose are installed
    • Check if ports 3000 and 8000 are available
    • Use docker-compose logs to view service logs

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference