metadata

title: PDF QA Chatbot
emoji: 📄🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: true

PDF-Based Q&A Chatbot System

A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.

Features

PDF Processing: Extract text and metadata from uploaded PDF documents
Vector Storage: Store document embeddings in ChromaDB for efficient retrieval
AI-Powered Q&A: Use OpenAI/Claude for intelligent question answering
Modern UI: Clean, responsive interface built with Next.js and Tailwind CSS
Real-time Chat: Interactive chat interface with conversation history
File Management: Upload, view, and manage multiple PDF documents
Context Awareness: Maintain conversation context and document references

Tech Stack

Backend

FastAPI: High-performance web framework
PyPDF2: PDF text extraction
ChromaDB: Vector database for embeddings
OpenAI/Claude: AI language models for Q&A
SQLAlchemy: Database ORM
Pydantic: Data validation

Frontend

Next.js 14: React framework with App Router
TypeScript: Type-safe development
Tailwind CSS: Utility-first styling
Shadcn/ui: Modern UI components
React Hook Form: Form handling
Zustand: State management

Project Structure

ChatbotCursor/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   ├── core/
│   │   ├── models/
│   │   ├── services/
│   │   └── utils/
│   ├── requirements.txt
│   └── main.py
├── frontend/
│   ├── app/
│   ├── components/
│   ├── lib/
│   └── package.json
├── docker-compose.yml
└── README.md

Quick Start

Option 1: Automated Setup (Recommended)

For Linux/macOS:

chmod +x setup.sh
./setup.sh

For Windows:

.\setup.ps1

Option 2: Manual Setup

Clone and Setup
```
cd ChatbotCursor
```

Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API keys

Frontend Setup

cd frontend
npm install
cp .env.example .env

Environment Variables
- Edit backend/.env and add your API keys:
  - OPENAI_API_KEY or ANTHROPIC_API_KEY
- The frontend .env should work with defaults

Run the Application

# Backend (Terminal 1)
cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate
uvicorn main:app --reload

# Frontend (Terminal 2)
cd frontend
npm run dev

Option 3: Docker Setup

# Build and run with Docker Compose
docker-compose up --build

# Or run services individually
docker-compose up backend
docker-compose up frontend

Access the Application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

Usage

Getting Started

Upload Documents
- Navigate to the "Documents" tab
- Drag and drop PDF files or click to select
- Wait for processing (text extraction and vector embedding)
- View upload status and document statistics
Start Chatting
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get AI-powered answers with source references
- View conversation history
Document Management
- View all uploaded documents with metadata
- Delete documents when no longer needed
- Monitor processing status and file sizes

Features

Smart Document Processing: Automatic text extraction and chunking
Vector Search: Semantic similarity search for relevant content
AI-Powered Q&A: Context-aware answers using OpenAI or Claude
Source Citations: See which documents and sections were referenced
Conversation History: Persistent chat sessions
File Management: Upload, view, and delete documents
Real-time Processing: Live status updates during uploads

Supported File Types

PDF Documents: All standard PDF files
Maximum Size: 10MB per file
Processing: Automatic text extraction and metadata parsing

API Endpoints

Document Management

POST /api/v1/documents/upload: Upload PDF documents
GET /api/v1/documents/: List all documents
GET /api/v1/documents/{id}: Get specific document
DELETE /api/v1/documents/{id}: Delete a document
GET /api/v1/documents/stats/summary: Get document statistics

Chat & Q&A

POST /api/v1/chat/: Send questions and get answers
GET /api/v1/chat/history/{session_id}: Get chat history
POST /api/v1/chat/session/new: Create new chat session
GET /api/v1/chat/sessions: List all sessions
DELETE /api/v1/chat/session/{session_id}: Delete session
GET /api/v1/chat/models/available: Get available AI models

System

GET /health: Health check
GET /docs: Interactive API documentation (Swagger UI)
GET /redoc: Alternative API documentation

Configuration

Environment Variables

Backend (.env):

# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key

# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760

Frontend (.env):

NEXT_PUBLIC_API_URL=http://localhost:8000

AI Provider Setup

OpenAI: Get API key from OpenAI Platform
Anthropic: Get API key from Anthropic Console

Development

Backend Development

cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000

Frontend Development

cd frontend
npm run dev

Testing

# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test

Troubleshooting

Common Issues

API Key Not Configured
- Ensure you've added your API key to backend/.env
- Restart the backend server after changing environment variables
Upload Fails
- Check file size (max 10MB)
- Ensure file is a valid PDF
- Check backend logs for detailed error messages
Chat Not Working
- Verify AI service is configured and working
- Check if documents are properly processed
- Review browser console for frontend errors
Docker Issues
- Ensure Docker and Docker Compose are installed
- Check if ports 3000 and 8000 are available
- Use docker-compose logs to view service logs

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with FastAPI and Next.js
Vector storage powered by ChromaDB
AI capabilities provided by OpenAI and Anthropic
UI components from Tailwind CSS and Lucide React

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference