File size: 7,731 Bytes
c573367
 
 
 
 
 
 
 
 
 
 
e22dcc4
 
 
 
 
 
 
 
e86a49a
e22dcc4
 
 
 
 
 
 
 
 
 
 
e86a49a
e22dcc4
 
 
 
 
 
 
 
 
 
 
7251a98
e86a49a
 
 
 
 
 
e22dcc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e86a49a
e22dcc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e86a49a
e22dcc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e86a49a
 
e22dcc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e86a49a
 
e22dcc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e86a49a
e22dcc4
 
 
 
 
e86a49a
af647b6
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
title: PDF QA Chatbot
emoji: "πŸ“„πŸ€–"
colorFrom: "blue"
colorTo: "purple"
sdk: docker
app_file: app.py
pinned: true
---


# PDF-Based Q&A Chatbot System

A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.

## Features

- **PDF Processing**: Extract text and metadata from uploaded PDF documents
- **Vector Storage**: Store document embeddings in ChromaDB for efficient retrieval
- **AI-Powered Q&A**: Use OpenAI/Claude for intelligent question answering
- **Modern UI**: Clean, responsive interface built with Next.js and Tailwind CSS
- **Real-time Chat**: Interactive chat interface with conversation history
- **File Management**: Upload, view, and manage multiple PDF documents
- **Context Awareness**: Maintain conversation context and document references

## Tech Stack

### Backend
- **FastAPI**: High-performance web framework
- **PyPDF2**: PDF text extraction
- **ChromaDB**: Vector database for embeddings
- **OpenAI/Claude**: AI language models for Q&A
- **SQLAlchemy**: Database ORM
- **Pydantic**: Data validation

### Frontend
- **Next.js 14**: React framework with App Router
- **TypeScript**: Type-safe development
- **Tailwind CSS**: Utility-first styling
- **Shadcn/ui**: Modern UI components
- **React Hook Form**: Form handling
- **Zustand**: State management

## Project Structure

```
ChatbotCursor/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── utils/
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── main.py
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ lib/
β”‚   └── package.json
β”œβ”€β”€ docker-compose.yml
└── README.md
```

## Quick Start

### Option 1: Automated Setup (Recommended)

**For Linux/macOS:**
```bash
chmod +x setup.sh
./setup.sh
```

**For Windows:**
```powershell
.\setup.ps1
```

### Option 2: Manual Setup

1. **Clone and Setup**
   ```bash
   cd ChatbotCursor
   ```

2. **Backend Setup**
   ```bash
   cd backend
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   pip install -r requirements.txt
   cp .env.example .env
   # Edit .env and add your API keys
   ```

3. **Frontend Setup**
   ```bash
   cd frontend
   npm install
   cp .env.example .env
   ```

4. **Environment Variables**
   - Edit `backend/.env` and add your API keys:
     - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`
   - The frontend `.env` should work with defaults

5. **Run the Application**
   ```bash
   # Backend (Terminal 1)
   cd backend
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   uvicorn main:app --reload
   
   # Frontend (Terminal 2)
   cd frontend
   npm run dev
   ```

### Option 3: Docker Setup

```bash
# Build and run with Docker Compose
docker-compose up --build

# Or run services individually
docker-compose up backend
docker-compose up frontend
```

6. **Access the Application**
   - Frontend: http://localhost:3000
   - Backend API: http://localhost:8000
   - API Documentation: http://localhost:8000/docs

## Usage

### Getting Started

1. **Upload Documents**
   - Navigate to the "Documents" tab
   - Drag and drop PDF files or click to select
   - Wait for processing (text extraction and vector embedding)
   - View upload status and document statistics

2. **Start Chatting**
   - Switch to the "Chat" tab
   - Ask questions about your uploaded documents
   - Get AI-powered answers with source references
   - View conversation history

3. **Document Management**
   - View all uploaded documents with metadata
   - Delete documents when no longer needed
   - Monitor processing status and file sizes

### Features

- **Smart Document Processing**: Automatic text extraction and chunking
- **Vector Search**: Semantic similarity search for relevant content
- **AI-Powered Q&A**: Context-aware answers using OpenAI or Claude
- **Source Citations**: See which documents and sections were referenced
- **Conversation History**: Persistent chat sessions
- **File Management**: Upload, view, and delete documents
- **Real-time Processing**: Live status updates during uploads

### Supported File Types

- **PDF Documents**: All standard PDF files
- **Maximum Size**: 10MB per file
- **Processing**: Automatic text extraction and metadata parsing

## API Endpoints

### Document Management
- `POST /api/v1/documents/upload`: Upload PDF documents
- `GET /api/v1/documents/`: List all documents
- `GET /api/v1/documents/{id}`: Get specific document
- `DELETE /api/v1/documents/{id}`: Delete a document
- `GET /api/v1/documents/stats/summary`: Get document statistics

### Chat & Q&A
- `POST /api/v1/chat/`: Send questions and get answers
- `GET /api/v1/chat/history/{session_id}`: Get chat history
- `POST /api/v1/chat/session/new`: Create new chat session
- `GET /api/v1/chat/sessions`: List all sessions
- `DELETE /api/v1/chat/session/{session_id}`: Delete session
- `GET /api/v1/chat/models/available`: Get available AI models

### System
- `GET /health`: Health check
- `GET /docs`: Interactive API documentation (Swagger UI)
- `GET /redoc`: Alternative API documentation

## Configuration

### Environment Variables

**Backend (.env):**
```env
# Required: Set at least one AI provider
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key

# Optional: Customize settings
DATABASE_URL=sqlite:///./pdf_chatbot.db
CHROMA_PERSIST_DIRECTORY=./chroma_db
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=10485760
```

**Frontend (.env):**
```env
NEXT_PUBLIC_API_URL=http://localhost:8000
```

### AI Provider Setup

1. **OpenAI**: Get API key from [OpenAI Platform](https://platform.openai.com/)
2. **Anthropic**: Get API key from [Anthropic Console](https://console.anthropic.com/)

## Development

### Backend Development
```bash
cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000
```

### Frontend Development
```bash
cd frontend
npm run dev
```

### Testing
```bash
# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test
```

## Troubleshooting

### Common Issues

1. **API Key Not Configured**
   - Ensure you've added your API key to `backend/.env`
   - Restart the backend server after changing environment variables

2. **Upload Fails**
   - Check file size (max 10MB)
   - Ensure file is a valid PDF
   - Check backend logs for detailed error messages

3. **Chat Not Working**
   - Verify AI service is configured and working
   - Check if documents are properly processed
   - Review browser console for frontend errors

4. **Docker Issues**
   - Ensure Docker and Docker Compose are installed
   - Check if ports 3000 and 8000 are available
   - Use `docker-compose logs` to view service logs

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/)
- Vector storage powered by [ChromaDB](https://www.trychroma.com/)
- AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/)
- UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/)

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference