Spaces:
Runtime error
Runtime error
File size: 7,731 Bytes
c573367 e22dcc4 e86a49a e22dcc4 e86a49a e22dcc4 7251a98 e86a49a e22dcc4 e86a49a e22dcc4 e86a49a e22dcc4 e86a49a e22dcc4 e86a49a e22dcc4 e86a49a e22dcc4 e86a49a af647b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
---
title: PDF QA Chatbot
emoji: "ππ€"
colorFrom: "blue"
colorTo: "purple"
sdk: docker
app_file: app.py
pinned: true
---
# PDF-Based Q&A Chatbot System
A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.
## Features
- **PDF Processing**: Extract text and metadata from uploaded PDF documents
- **Vector Storage**: Store document embeddings in ChromaDB for efficient retrieval
- **AI-Powered Q&A**: Use OpenAI/Claude for intelligent question answering
- **Modern UI**: Clean, responsive interface built with Next.js and Tailwind CSS
- **Real-time Chat**: Interactive chat interface with conversation history
- **File Management**: Upload, view, and manage multiple PDF documents
- **Context Awareness**: Maintain conversation context and document references
## Tech Stack
### Backend
- **FastAPI**: High-performance web framework
- **PyPDF2**: PDF text extraction
- **ChromaDB**: Vector database for embeddings
- **OpenAI/Claude**: AI language models for Q&A
- **SQLAlchemy**: Database ORM
- **Pydantic**: Data validation
### Frontend
- **Next.js 14**: React framework with App Router
- **TypeScript**: Type-safe development
- **Tailwind CSS**: Utility-first styling
- **Shadcn/ui**: Modern UI components
- **React Hook Form**: Form handling
- **Zustand**: State management
## Project Structure
```
ChatbotCursor/
βββ backend/
β βββ app/
β β βββ api/
β β βββ core/
β β βββ models/
β β βββ services/
β β βββ utils/
β βββ requirements.txt
β βββ main.py
βββ frontend/
β βββ app/
β βββ components/
β βββ lib/
β βββ package.json
βββ docker-compose.yml
βββ README.md
```
## Quick Start
### Option 1: Automated Setup (Recommended)
**For Linux/macOS:**
```bash
chmod +x setup.sh
./setup.sh
```
**For Windows:**
```powershell
.\setup.ps1
```
### Option 2: Manual Setup
1. **Clone and Setup**
```bash
cd ChatbotCursor
```
2. **Backend Setup**
```bash
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API keys
```
3. **Frontend Setup**
```bash
cd frontend
npm install
cp .env.example .env
```
4. **Environment Variables**
- Edit `backend/.env` and add your API keys:
- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`
- The frontend `.env` should work with defaults
5. **Run the Application**
```bash
# Backend (Terminal 1)
cd backend
source venv/bin/activate # On Windows: venv\Scripts\activate
uvicorn main:app --reload
# Frontend (Terminal 2)
cd frontend
npm run dev
```
### Option 3: Docker Setup
```bash
# Build and run with Docker Compose
docker-compose up --build
# Or run services individually
docker-compose up backend
docker-compose up frontend
```
6. **Access the Application**
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
## Usage
### Getting Started
1. **Upload Documents**
- Navigate to the "Documents" tab
- Drag and drop PDF files or click to select
- Wait for processing (text extraction and vector embedding)
- View upload status and document statistics
2. **Start Chatting**
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get AI-powered answers with source references
- View conversation history
3. **Document Management**
- View all uploaded documents with metadata
- Delete documents when no longer needed
- Monitor processing status and file sizes
### Features
- **Smart Document Processing**: Automatic text extraction and chunking
- **Vector Search**: Semantic similarity search for relevant content
- **AI-Powered Q&A**: Context-aware answers using OpenAI or Claude
- **Source Citations**: See which documents and sections were referenced
- **Conversation History**: Persistent chat sessions
- **File Management**: Upload, view, and delete documents
- **Real-time Processing**: Live status updates during uploads
### Supported File Types
- **PDF Documents**: All standard PDF files
- **Maximum Size**: 10MB per file
- **Processing**: Automatic text extraction and metadata parsing
## API Endpoints
### Document Management
- `POST /api/v1/documents/upload`: Upload PDF documents
- `GET /api/v1/documents/`: List all documents
- `GET /api/v1/documents/{id}`: Get specific document
- `DELETE /api/v1/documents/{id}`: Delete a document
- `GET /api/v1/documents/stats/summary`: Get document statistics
### Chat & Q&A
- `POST /api/v1/chat/`: Send questions and get answers
- `GET /api/v1/chat/history/{session_id}`: Get chat history
- `POST /api/v1/chat/session/new`: Create new chat session
- `GET /api/v1/chat/sessions`: List all sessions
- `DELETE /api/v1/chat/session/{session_id}`: Delete session
- `GET /api/v1/chat/models/available`: Get available AI models
### System
- `GET /health`: Health check
- `GET /docs`: Interactive API documentation (Swagger UI)
- `GET /redoc`: Alternative API documentation
## Configuration
### Environment Variables
**Backend (.env):**
```env
# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760
```
**Frontend (.env):**
```env
NEXT_PUBLIC_API_URL=http://localhost:8000
```
### AI Provider Setup
1. **OpenAI**: Get API key from [OpenAI Platform](https://platform.openai.com/)
2. **Anthropic**: Get API key from [Anthropic Console](https://console.anthropic.com/)
## Development
### Backend Development
```bash
cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000
```
### Frontend Development
```bash
cd frontend
npm run dev
```
### Testing
```bash
# Backend tests
cd backend
pytest
# Frontend tests
cd frontend
npm test
```
## Troubleshooting
### Common Issues
1. **API Key Not Configured**
- Ensure you've added your API key to `backend/.env`
- Restart the backend server after changing environment variables
2. **Upload Fails**
- Check file size (max 10MB)
- Ensure file is a valid PDF
- Check backend logs for detailed error messages
3. **Chat Not Working**
- Verify AI service is configured and working
- Check if documents are properly processed
- Review browser console for frontend errors
4. **Docker Issues**
- Ensure Docker and Docker Compose are installed
- Check if ports 3000 and 8000 are available
- Use `docker-compose logs` to view service logs
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/)
- Vector storage powered by [ChromaDB](https://www.trychroma.com/)
- AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/)
- UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/)
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |