Spaces:
Runtime error
Runtime error
metadata
title: PDF QA Chatbot
emoji: ππ€
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: true
PDF-Based Q&A Chatbot System
A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.
Features
- PDF Processing: Extract text and metadata from uploaded PDF documents
- Vector Storage: Store document embeddings in ChromaDB for efficient retrieval
- AI-Powered Q&A: Use OpenAI/Claude for intelligent question answering
- Modern UI: Clean, responsive interface built with Next.js and Tailwind CSS
- Real-time Chat: Interactive chat interface with conversation history
- File Management: Upload, view, and manage multiple PDF documents
- Context Awareness: Maintain conversation context and document references
Tech Stack
Backend
- FastAPI: High-performance web framework
- PyPDF2: PDF text extraction
- ChromaDB: Vector database for embeddings
- OpenAI/Claude: AI language models for Q&A
- SQLAlchemy: Database ORM
- Pydantic: Data validation
Frontend
- Next.js 14: React framework with App Router
- TypeScript: Type-safe development
- Tailwind CSS: Utility-first styling
- Shadcn/ui: Modern UI components
- React Hook Form: Form handling
- Zustand: State management
Project Structure
ChatbotCursor/
βββ backend/
β βββ app/
β β βββ api/
β β βββ core/
β β βββ models/
β β βββ services/
β β βββ utils/
β βββ requirements.txt
β βββ main.py
βββ frontend/
β βββ app/
β βββ components/
β βββ lib/
β βββ package.json
βββ docker-compose.yml
βββ README.md
Quick Start
Option 1: Automated Setup (Recommended)
For Linux/macOS:
chmod +x setup.sh
./setup.sh
For Windows:
.\setup.ps1
Option 2: Manual Setup
Clone and Setup
cd ChatbotCursor
Backend Setup
cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt cp .env.example .env # Edit .env and add your API keys
Frontend Setup
cd frontend npm install cp .env.example .env
Environment Variables
- Edit
backend/.env
and add your API keys:OPENAI_API_KEY
orANTHROPIC_API_KEY
- The frontend
.env
should work with defaults
- Edit
Run the Application
# Backend (Terminal 1) cd backend source venv/bin/activate # On Windows: venv\Scripts\activate uvicorn main:app --reload # Frontend (Terminal 2) cd frontend npm run dev
Option 3: Docker Setup
# Build and run with Docker Compose
docker-compose up --build
# Or run services individually
docker-compose up backend
docker-compose up frontend
- Access the Application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Usage
Getting Started
Upload Documents
- Navigate to the "Documents" tab
- Drag and drop PDF files or click to select
- Wait for processing (text extraction and vector embedding)
- View upload status and document statistics
Start Chatting
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get AI-powered answers with source references
- View conversation history
Document Management
- View all uploaded documents with metadata
- Delete documents when no longer needed
- Monitor processing status and file sizes
Features
- Smart Document Processing: Automatic text extraction and chunking
- Vector Search: Semantic similarity search for relevant content
- AI-Powered Q&A: Context-aware answers using OpenAI or Claude
- Source Citations: See which documents and sections were referenced
- Conversation History: Persistent chat sessions
- File Management: Upload, view, and delete documents
- Real-time Processing: Live status updates during uploads
Supported File Types
- PDF Documents: All standard PDF files
- Maximum Size: 10MB per file
- Processing: Automatic text extraction and metadata parsing
API Endpoints
Document Management
POST /api/v1/documents/upload
: Upload PDF documentsGET /api/v1/documents/
: List all documentsGET /api/v1/documents/{id}
: Get specific documentDELETE /api/v1/documents/{id}
: Delete a documentGET /api/v1/documents/stats/summary
: Get document statistics
Chat & Q&A
POST /api/v1/chat/
: Send questions and get answersGET /api/v1/chat/history/{session_id}
: Get chat historyPOST /api/v1/chat/session/new
: Create new chat sessionGET /api/v1/chat/sessions
: List all sessionsDELETE /api/v1/chat/session/{session_id}
: Delete sessionGET /api/v1/chat/models/available
: Get available AI models
System
GET /health
: Health checkGET /docs
: Interactive API documentation (Swagger UI)GET /redoc
: Alternative API documentation
Configuration
Environment Variables
Backend (.env):
# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760
Frontend (.env):
NEXT_PUBLIC_API_URL=http://localhost:8000
AI Provider Setup
- OpenAI: Get API key from OpenAI Platform
- Anthropic: Get API key from Anthropic Console
Development
Backend Development
cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000
Frontend Development
cd frontend
npm run dev
Testing
# Backend tests
cd backend
pytest
# Frontend tests
cd frontend
npm test
Troubleshooting
Common Issues
API Key Not Configured
- Ensure you've added your API key to
backend/.env
- Restart the backend server after changing environment variables
- Ensure you've added your API key to
Upload Fails
- Check file size (max 10MB)
- Ensure file is a valid PDF
- Check backend logs for detailed error messages
Chat Not Working
- Verify AI service is configured and working
- Check if documents are properly processed
- Review browser console for frontend errors
Docker Issues
- Ensure Docker and Docker Compose are installed
- Check if ports 3000 and 8000 are available
- Use
docker-compose logs
to view service logs
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built with FastAPI and Next.js
- Vector storage powered by ChromaDB
- AI capabilities provided by OpenAI and Anthropic
- UI components from Tailwind CSS and Lucide React
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference