Spaces:
Runtime error
Runtime error
title: PDF QA Chatbot | |
emoji: "ππ€" | |
colorFrom: "blue" | |
colorTo: "purple" | |
sdk: docker | |
app_file: app.py | |
pinned: true | |
# PDF-Based Q&A Chatbot System | |
A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries. | |
## Features | |
- **PDF Processing**: Extract text and metadata from uploaded PDF documents | |
- **Vector Storage**: Store document embeddings in ChromaDB for efficient retrieval | |
- **AI-Powered Q&A**: Use OpenAI/Claude for intelligent question answering | |
- **Modern UI**: Clean, responsive interface built with Next.js and Tailwind CSS | |
- **Real-time Chat**: Interactive chat interface with conversation history | |
- **File Management**: Upload, view, and manage multiple PDF documents | |
- **Context Awareness**: Maintain conversation context and document references | |
## Tech Stack | |
### Backend | |
- **FastAPI**: High-performance web framework | |
- **PyPDF2**: PDF text extraction | |
- **ChromaDB**: Vector database for embeddings | |
- **OpenAI/Claude**: AI language models for Q&A | |
- **SQLAlchemy**: Database ORM | |
- **Pydantic**: Data validation | |
### Frontend | |
- **Next.js 14**: React framework with App Router | |
- **TypeScript**: Type-safe development | |
- **Tailwind CSS**: Utility-first styling | |
- **Shadcn/ui**: Modern UI components | |
- **React Hook Form**: Form handling | |
- **Zustand**: State management | |
## Project Structure | |
``` | |
ChatbotCursor/ | |
βββ backend/ | |
β βββ app/ | |
β β βββ api/ | |
β β βββ core/ | |
β β βββ models/ | |
β β βββ services/ | |
β β βββ utils/ | |
β βββ requirements.txt | |
β βββ main.py | |
βββ frontend/ | |
β βββ app/ | |
β βββ components/ | |
β βββ lib/ | |
β βββ package.json | |
βββ docker-compose.yml | |
βββ README.md | |
``` | |
## Quick Start | |
### Option 1: Automated Setup (Recommended) | |
**For Linux/macOS:** | |
```bash | |
chmod +x setup.sh | |
./setup.sh | |
``` | |
**For Windows:** | |
```powershell | |
.\setup.ps1 | |
``` | |
### Option 2: Manual Setup | |
1. **Clone and Setup** | |
```bash | |
cd ChatbotCursor | |
``` | |
2. **Backend Setup** | |
```bash | |
cd backend | |
python -m venv venv | |
source venv/bin/activate # On Windows: venv\Scripts\activate | |
pip install -r requirements.txt | |
cp .env.example .env | |
# Edit .env and add your API keys | |
``` | |
3. **Frontend Setup** | |
```bash | |
cd frontend | |
npm install | |
cp .env.example .env | |
``` | |
4. **Environment Variables** | |
- Edit `backend/.env` and add your API keys: | |
- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` | |
- The frontend `.env` should work with defaults | |
5. **Run the Application** | |
```bash | |
# Backend (Terminal 1) | |
cd backend | |
source venv/bin/activate # On Windows: venv\Scripts\activate | |
uvicorn main:app --reload | |
# Frontend (Terminal 2) | |
cd frontend | |
npm run dev | |
``` | |
### Option 3: Docker Setup | |
```bash | |
# Build and run with Docker Compose | |
docker-compose up --build | |
# Or run services individually | |
docker-compose up backend | |
docker-compose up frontend | |
``` | |
6. **Access the Application** | |
- Frontend: http://localhost:3000 | |
- Backend API: http://localhost:8000 | |
- API Documentation: http://localhost:8000/docs | |
## Usage | |
### Getting Started | |
1. **Upload Documents** | |
- Navigate to the "Documents" tab | |
- Drag and drop PDF files or click to select | |
- Wait for processing (text extraction and vector embedding) | |
- View upload status and document statistics | |
2. **Start Chatting** | |
- Switch to the "Chat" tab | |
- Ask questions about your uploaded documents | |
- Get AI-powered answers with source references | |
- View conversation history | |
3. **Document Management** | |
- View all uploaded documents with metadata | |
- Delete documents when no longer needed | |
- Monitor processing status and file sizes | |
### Features | |
- **Smart Document Processing**: Automatic text extraction and chunking | |
- **Vector Search**: Semantic similarity search for relevant content | |
- **AI-Powered Q&A**: Context-aware answers using OpenAI or Claude | |
- **Source Citations**: See which documents and sections were referenced | |
- **Conversation History**: Persistent chat sessions | |
- **File Management**: Upload, view, and delete documents | |
- **Real-time Processing**: Live status updates during uploads | |
### Supported File Types | |
- **PDF Documents**: All standard PDF files | |
- **Maximum Size**: 10MB per file | |
- **Processing**: Automatic text extraction and metadata parsing | |
## API Endpoints | |
### Document Management | |
- `POST /api/v1/documents/upload`: Upload PDF documents | |
- `GET /api/v1/documents/`: List all documents | |
- `GET /api/v1/documents/{id}`: Get specific document | |
- `DELETE /api/v1/documents/{id}`: Delete a document | |
- `GET /api/v1/documents/stats/summary`: Get document statistics | |
### Chat & Q&A | |
- `POST /api/v1/chat/`: Send questions and get answers | |
- `GET /api/v1/chat/history/{session_id}`: Get chat history | |
- `POST /api/v1/chat/session/new`: Create new chat session | |
- `GET /api/v1/chat/sessions`: List all sessions | |
- `DELETE /api/v1/chat/session/{session_id}`: Delete session | |
- `GET /api/v1/chat/models/available`: Get available AI models | |
### System | |
- `GET /health`: Health check | |
- `GET /docs`: Interactive API documentation (Swagger UI) | |
- `GET /redoc`: Alternative API documentation | |
## Configuration | |
### Environment Variables | |
**Backend (.env):** | |
```env | |
# Required: Set at least one AI provider | |
OPENAI_API_KEY=your-openai-api-key | |
ANTHROPIC_API_KEY=your-anthropic-api-key | |
# Optional: Customize settings | |
DATABASE_URL=sqlite:///./pdf_chatbot.db | |
CHROMA_PERSIST_DIRECTORY=./chroma_db | |
UPLOAD_DIR=./uploads | |
MAX_FILE_SIZE=10485760 | |
``` | |
**Frontend (.env):** | |
```env | |
NEXT_PUBLIC_API_URL=http://localhost:8000 | |
``` | |
### AI Provider Setup | |
1. **OpenAI**: Get API key from [OpenAI Platform](https://platform.openai.com/) | |
2. **Anthropic**: Get API key from [Anthropic Console](https://console.anthropic.com/) | |
## Development | |
### Backend Development | |
```bash | |
cd backend | |
source venv/bin/activate | |
uvicorn main:app --reload --port 8000 | |
``` | |
### Frontend Development | |
```bash | |
cd frontend | |
npm run dev | |
``` | |
### Testing | |
```bash | |
# Backend tests | |
cd backend | |
pytest | |
# Frontend tests | |
cd frontend | |
npm test | |
``` | |
## Troubleshooting | |
### Common Issues | |
1. **API Key Not Configured** | |
- Ensure you've added your API key to `backend/.env` | |
- Restart the backend server after changing environment variables | |
2. **Upload Fails** | |
- Check file size (max 10MB) | |
- Ensure file is a valid PDF | |
- Check backend logs for detailed error messages | |
3. **Chat Not Working** | |
- Verify AI service is configured and working | |
- Check if documents are properly processed | |
- Review browser console for frontend errors | |
4. **Docker Issues** | |
- Ensure Docker and Docker Compose are installed | |
- Check if ports 3000 and 8000 are available | |
- Use `docker-compose logs` to view service logs | |
## Contributing | |
1. Fork the repository | |
2. Create a feature branch (`git checkout -b feature/amazing-feature`) | |
3. Commit your changes (`git commit -m 'Add amazing feature'`) | |
4. Push to the branch (`git push origin feature/amazing-feature`) | |
5. Open a Pull Request | |
## License | |
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
## Acknowledgments | |
- Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/) | |
- Vector storage powered by [ChromaDB](https://www.trychroma.com/) | |
- AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/) | |
- UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/) | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |