pdf-qa-chatbot / README.md
Amin23's picture
Initial commit
c573367
---
title: PDF QA Chatbot
emoji: "πŸ“„πŸ€–"
colorFrom: "blue"
colorTo: "purple"
sdk: docker
app_file: app.py
pinned: true
---
# PDF-Based Q&A Chatbot System
A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.
## Features
- **PDF Processing**: Extract text and metadata from uploaded PDF documents
- **Vector Storage**: Store document embeddings in ChromaDB for efficient retrieval
- **AI-Powered Q&A**: Use OpenAI/Claude for intelligent question answering
- **Modern UI**: Clean, responsive interface built with Next.js and Tailwind CSS
- **Real-time Chat**: Interactive chat interface with conversation history
- **File Management**: Upload, view, and manage multiple PDF documents
- **Context Awareness**: Maintain conversation context and document references
## Tech Stack
### Backend
- **FastAPI**: High-performance web framework
- **PyPDF2**: PDF text extraction
- **ChromaDB**: Vector database for embeddings
- **OpenAI/Claude**: AI language models for Q&A
- **SQLAlchemy**: Database ORM
- **Pydantic**: Data validation
### Frontend
- **Next.js 14**: React framework with App Router
- **TypeScript**: Type-safe development
- **Tailwind CSS**: Utility-first styling
- **Shadcn/ui**: Modern UI components
- **React Hook Form**: Form handling
- **Zustand**: State management
## Project Structure
```
ChatbotCursor/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ app/
β”‚ β”‚ β”œβ”€β”€ api/
β”‚ β”‚ β”œβ”€β”€ core/
β”‚ β”‚ β”œβ”€β”€ models/
β”‚ β”‚ β”œβ”€β”€ services/
β”‚ β”‚ └── utils/
β”‚ β”œβ”€β”€ requirements.txt
β”‚ └── main.py
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ components/
β”‚ β”œβ”€β”€ lib/
β”‚ └── package.json
β”œβ”€β”€ docker-compose.yml
└── README.md
```
## Quick Start
### Option 1: Automated Setup (Recommended)
**For Linux/macOS:**
```bash
chmod +x setup.sh
./setup.sh
```
**For Windows:**
```powershell
.\setup.ps1
```
### Option 2: Manual Setup
1. **Clone and Setup**
```bash
cd ChatbotCursor
```
2. **Backend Setup**
```bash
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API keys
```
3. **Frontend Setup**
```bash
cd frontend
npm install
cp .env.example .env
```
4. **Environment Variables**
- Edit `backend/.env` and add your API keys:
- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`
- The frontend `.env` should work with defaults
5. **Run the Application**
```bash
# Backend (Terminal 1)
cd backend
source venv/bin/activate # On Windows: venv\Scripts\activate
uvicorn main:app --reload
# Frontend (Terminal 2)
cd frontend
npm run dev
```
### Option 3: Docker Setup
```bash
# Build and run with Docker Compose
docker-compose up --build
# Or run services individually
docker-compose up backend
docker-compose up frontend
```
6. **Access the Application**
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
## Usage
### Getting Started
1. **Upload Documents**
- Navigate to the "Documents" tab
- Drag and drop PDF files or click to select
- Wait for processing (text extraction and vector embedding)
- View upload status and document statistics
2. **Start Chatting**
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get AI-powered answers with source references
- View conversation history
3. **Document Management**
- View all uploaded documents with metadata
- Delete documents when no longer needed
- Monitor processing status and file sizes
### Features
- **Smart Document Processing**: Automatic text extraction and chunking
- **Vector Search**: Semantic similarity search for relevant content
- **AI-Powered Q&A**: Context-aware answers using OpenAI or Claude
- **Source Citations**: See which documents and sections were referenced
- **Conversation History**: Persistent chat sessions
- **File Management**: Upload, view, and delete documents
- **Real-time Processing**: Live status updates during uploads
### Supported File Types
- **PDF Documents**: All standard PDF files
- **Maximum Size**: 10MB per file
- **Processing**: Automatic text extraction and metadata parsing
## API Endpoints
### Document Management
- `POST /api/v1/documents/upload`: Upload PDF documents
- `GET /api/v1/documents/`: List all documents
- `GET /api/v1/documents/{id}`: Get specific document
- `DELETE /api/v1/documents/{id}`: Delete a document
- `GET /api/v1/documents/stats/summary`: Get document statistics
### Chat & Q&A
- `POST /api/v1/chat/`: Send questions and get answers
- `GET /api/v1/chat/history/{session_id}`: Get chat history
- `POST /api/v1/chat/session/new`: Create new chat session
- `GET /api/v1/chat/sessions`: List all sessions
- `DELETE /api/v1/chat/session/{session_id}`: Delete session
- `GET /api/v1/chat/models/available`: Get available AI models
### System
- `GET /health`: Health check
- `GET /docs`: Interactive API documentation (Swagger UI)
- `GET /redoc`: Alternative API documentation
## Configuration
### Environment Variables
**Backend (.env):**
```env
# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760
```
**Frontend (.env):**
```env
NEXT_PUBLIC_API_URL=http://localhost:8000
```
### AI Provider Setup
1. **OpenAI**: Get API key from [OpenAI Platform](https://platform.openai.com/)
2. **Anthropic**: Get API key from [Anthropic Console](https://console.anthropic.com/)
## Development
### Backend Development
```bash
cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000
```
### Frontend Development
```bash
cd frontend
npm run dev
```
### Testing
```bash
# Backend tests
cd backend
pytest
# Frontend tests
cd frontend
npm test
```
## Troubleshooting
### Common Issues
1. **API Key Not Configured**
- Ensure you've added your API key to `backend/.env`
- Restart the backend server after changing environment variables
2. **Upload Fails**
- Check file size (max 10MB)
- Ensure file is a valid PDF
- Check backend logs for detailed error messages
3. **Chat Not Working**
- Verify AI service is configured and working
- Check if documents are properly processed
- Review browser console for frontend errors
4. **Docker Issues**
- Ensure Docker and Docker Compose are installed
- Check if ports 3000 and 8000 are available
- Use `docker-compose logs` to view service logs
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/)
- Vector storage powered by [ChromaDB](https://www.trychroma.com/)
- AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/)
- UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/)
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference