Spaces:

Prog-amin
/

pdf-qa-chatbot

Runtime error

File size: 7,731 Bytes

---
title: PDF QA Chatbot
emoji: "📄🤖"
colorFrom: "blue"
colorTo: "purple"
sdk: docker
app_file: app.py
pinned: true
---


# PDF-Based Q&A Chatbot System

A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.

## Features

- **PDF Processing**: Extract text and metadata from uploaded PDF documents
- **Vector Storage**: Store document embeddings in ChromaDB for efficient retrieval
- **AI-Powered Q&A**: Use OpenAI/Claude for intelligent question answering
- **Modern UI**: Clean, responsive interface built with Next.js and Tailwind CSS
- **Real-time Chat**: Interactive chat interface with conversation history
- **File Management**: Upload, view, and manage multiple PDF documents
- **Context Awareness**: Maintain conversation context and document references

## Tech Stack

### Backend
- **FastAPI**: High-performance web framework
- **PyPDF2**: PDF text extraction
- **ChromaDB**: Vector database for embeddings
- **OpenAI/Claude**: AI language models for Q&A
- **SQLAlchemy**: Database ORM
- **Pydantic**: Data validation

### Frontend
- **Next.js 14**: React framework with App Router
- **TypeScript**: Type-safe development
- **Tailwind CSS**: Utility-first styling
- **Shadcn/ui**: Modern UI components
- **React Hook Form**: Form handling
- **Zustand**: State management

## Project Structure

```
ChatbotCursor/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   ├── core/
│   │   ├── models/
│   │   ├── services/
│   │   └── utils/
│   ├── requirements.txt
│   └── main.py
├── frontend/
│   ├── app/
│   ├── components/
│   ├── lib/
│   └── package.json
├── docker-compose.yml
└── README.md
```

## Quick Start

### Option 1: Automated Setup (Recommended)

**For Linux/macOS:**
```bash
chmod +x setup.sh
./setup.sh
```

**For Windows:**
```powershell
.\setup.ps1
```

### Option 2: Manual Setup

1. **Clone and Setup**
   ```bash
   cd ChatbotCursor
   ```

2. **Backend Setup**
   ```bash
   cd backend
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   pip install -r requirements.txt
   cp .env.example .env
   # Edit .env and add your API keys
   ```

3. **Frontend Setup**
   ```bash
   cd frontend
   npm install
   cp .env.example .env
   ```

4. **Environment Variables**
   - Edit `backend/.env` and add your API keys:
     - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`
   - The frontend `.env` should work with defaults

5. **Run the Application**
   ```bash
   # Backend (Terminal 1)
   cd backend
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   uvicorn main:app --reload
   
   # Frontend (Terminal 2)
   cd frontend
   npm run dev
   ```

### Option 3: Docker Setup

```bash
# Build and run with Docker Compose
docker-compose up --build

# Or run services individually
docker-compose up backend
docker-compose up frontend
```

6. **Access the Application**
   - Frontend: http://localhost:3000
   - Backend API: http://localhost:8000
   - API Documentation: http://localhost:8000/docs

## Usage

### Getting Started

1. **Upload Documents**
   - Navigate to the "Documents" tab
   - Drag and drop PDF files or click to select
   - Wait for processing (text extraction and vector embedding)
   - View upload status and document statistics

2. **Start Chatting**
   - Switch to the "Chat" tab
   - Ask questions about your uploaded documents
   - Get AI-powered answers with source references
   - View conversation history

3. **Document Management**
   - View all uploaded documents with metadata
   - Delete documents when no longer needed
   - Monitor processing status and file sizes

### Features

- **Smart Document Processing**: Automatic text extraction and chunking
- **Vector Search**: Semantic similarity search for relevant content
- **AI-Powered Q&A**: Context-aware answers using OpenAI or Claude
- **Source Citations**: See which documents and sections were referenced
- **Conversation History**: Persistent chat sessions
- **File Management**: Upload, view, and delete documents
- **Real-time Processing**: Live status updates during uploads

### Supported File Types

- **PDF Documents**: All standard PDF files
- **Maximum Size**: 10MB per file
- **Processing**: Automatic text extraction and metadata parsing

## API Endpoints

### Document Management
- `POST /api/v1/documents/upload`: Upload PDF documents
- `GET /api/v1/documents/`: List all documents
- `GET /api/v1/documents/{id}`: Get specific document
- `DELETE /api/v1/documents/{id}`: Delete a document
- `GET /api/v1/documents/stats/summary`: Get document statistics

### Chat & Q&A
- `POST /api/v1/chat/`: Send questions and get answers
- `GET /api/v1/chat/history/{session_id}`: Get chat history
- `POST /api/v1/chat/session/new`: Create new chat session
- `GET /api/v1/chat/sessions`: List all sessions
- `DELETE /api/v1/chat/session/{session_id}`: Delete session
- `GET /api/v1/chat/models/available`: Get available AI models

### System
- `GET /health`: Health check
- `GET /docs`: Interactive API documentation (Swagger UI)
- `GET /redoc`: Alternative API documentation

## Configuration

### Environment Variables

**Backend (.env):**
```env
# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key

# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760
```

**Frontend (.env):**
```env
NEXT_PUBLIC_API_URL=http://localhost:8000
```

### AI Provider Setup

1. **OpenAI**: Get API key from [OpenAI Platform](https://platform.openai.com/)
2. **Anthropic**: Get API key from [Anthropic Console](https://console.anthropic.com/)

## Development

### Backend Development
```bash
cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000
```

### Frontend Development
```bash
cd frontend
npm run dev
```

### Testing
```bash
# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test
```

## Troubleshooting

### Common Issues

1. **API Key Not Configured**
   - Ensure you've added your API key to `backend/.env`
   - Restart the backend server after changing environment variables

2. **Upload Fails**
   - Check file size (max 10MB)
   - Ensure file is a valid PDF
   - Check backend logs for detailed error messages

3. **Chat Not Working**
   - Verify AI service is configured and working
   - Check if documents are properly processed
   - Review browser console for frontend errors

4. **Docker Issues**
   - Ensure Docker and Docker Compose are installed
   - Check if ports 3000 and 8000 are available
   - Use `docker-compose logs` to view service logs

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/)
- Vector storage powered by [ChromaDB](https://www.trychroma.com/)
- AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/)
- UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/)

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference