Amin23 commited on
Commit
e22dcc4
Β·
0 Parent(s):

Iniital commit

Browse files
.gitignore ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # --- Python Backend ---
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.pyd
6
+ venv/
7
+ ENV/
8
+ .env
9
+ .venv/
10
+ *.db
11
+ *.sqlite3
12
+ chroma_db/
13
+ uploads/
14
+ .vscode/
15
+ .DS_Store
16
+ Thumbs.db
17
+ *.log
18
+ *.ipynb
19
+ .pytest_cache/
20
+ dist/
21
+ build/
22
+ *.egg-info/
23
+ htmlcov/
24
+ .coverage
25
+ coverage.xml
26
+
27
+ # --- Frontend (Next.js/React) ---
28
+ node_modules/
29
+ .next/
30
+ out/
31
+ .env
32
+ .env.*
33
+ npm-debug.log*
34
+ yarn-debug.log*
35
+ yarn-error.log*
36
+ dist/
37
+ coverage/
38
+
39
+ # --- General ---
40
+ # Ignore Docker build cache
41
+ # Ignore coverage reports
42
+ # Ignore node_modules if any
LICENSE ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ UNLICENSED – VIEW ONLY
2
+
3
+ Copyright (c) 2025 Al Amin
4
+
5
+ Permission is NOT granted to any person obtaining a copy of this software and associated documentation files (the "Software") to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.
6
+
7
+ The Software is provided strictly for viewing and educational reference purposes only.
8
+
9
+ The above copyright notice must be retained, and this notice must appear in all copies or substantial portions of the Software.
10
+
11
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE VIEWING OF THE SOFTWARE.
README.md ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PDF-Based Q&A Chatbot System
2
+
3
+ A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.
4
+
5
+ ## Features
6
+
7
+ - **PDF Processing**: Extract text and metadata from uploaded PDF documents
8
+ - **Vector Storage**: Store document embeddings in ChromaDB for efficient retrieval
9
+ - **AI-Powered Q&A**: Use OpenAI/Claude for intelligent question answering
10
+ - **Modern UI**: Clean, responsive interface built with Next.js and Tailwind CSS
11
+ - **Real-time Chat**: Interactive chat interface with conversation history
12
+ - **File Management**: Upload, view, and manage multiple PDF documents
13
+ - **Context Awareness**: Maintain conversation context and document references
14
+
15
+ ## Tech Stack
16
+
17
+ ### Backend
18
+ - **FastAPI**: High-performance web framework
19
+ - **PyPDF2**: PDF text extraction
20
+ - **ChromaDB**: Vector database for embeddings
21
+ - **OpenAI/Claude**: AI language models for Q&A
22
+ - **SQLAlchemy**: Database ORM
23
+ - **Pydantic**: Data validation
24
+
25
+ ### Frontend
26
+ - **Next.js 14**: React framework with App Router
27
+ - **TypeScript**: Type-safe development
28
+ - **Tailwind CSS**: Utility-first styling
29
+ - **Shadcn/ui**: Modern UI components
30
+ - **React Hook Form**: Form handling
31
+ - **Zustand**: State management
32
+
33
+ ## Project Structure
34
+
35
+ ```
36
+ ChatbotCursor/
37
+ β”œβ”€β”€ backend/
38
+ β”‚ β”œβ”€β”€ app/
39
+ β”‚ β”‚ β”œβ”€β”€ api/
40
+ β”‚ β”‚ β”œβ”€β”€ core/
41
+ β”‚ β”‚ β”œβ”€β”€ models/
42
+ β”‚ β”‚ β”œβ”€β”€ services/
43
+ β”‚ β”‚ └── utils/
44
+ β”‚ β”œβ”€β”€ requirements.txt
45
+ β”‚ └── main.py
46
+ β”œβ”€β”€ frontend/
47
+ β”‚ β”œβ”€β”€ app/
48
+ β”‚ β”œβ”€β”€ components/
49
+ β”‚ β”œβ”€β”€ lib/
50
+ β”‚ └── package.json
51
+ β”œβ”€β”€ docker-compose.yml
52
+ └── README.md
53
+ ```
54
+
55
+ ## Quick Start
56
+
57
+ ### Option 1: Automated Setup (Recommended)
58
+
59
+ **For Linux/macOS:**
60
+ ```bash
61
+ chmod +x setup.sh
62
+ ./setup.sh
63
+ ```
64
+
65
+ **For Windows:**
66
+ ```powershell
67
+ .\setup.ps1
68
+ ```
69
+
70
+ ### Option 2: Manual Setup
71
+
72
+ 1. **Clone and Setup**
73
+ ```bash
74
+ cd ChatbotCursor
75
+ ```
76
+
77
+ 2. **Backend Setup**
78
+ ```bash
79
+ cd backend
80
+ python -m venv venv
81
+ source venv/bin/activate # On Windows: venv\Scripts\activate
82
+ pip install -r requirements.txt
83
+ cp .env.example .env
84
+ # Edit .env and add your API keys
85
+ ```
86
+
87
+ 3. **Frontend Setup**
88
+ ```bash
89
+ cd frontend
90
+ npm install
91
+ cp .env.example .env
92
+ ```
93
+
94
+ 4. **Environment Variables**
95
+ - Edit `backend/.env` and add your API keys:
96
+ - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`
97
+ - The frontend `.env` should work with defaults
98
+
99
+ 5. **Run the Application**
100
+ ```bash
101
+ # Backend (Terminal 1)
102
+ cd backend
103
+ source venv/bin/activate # On Windows: venv\Scripts\activate
104
+ uvicorn main:app --reload
105
+
106
+ # Frontend (Terminal 2)
107
+ cd frontend
108
+ npm run dev
109
+ ```
110
+
111
+ ### Option 3: Docker Setup
112
+
113
+ ```bash
114
+ # Build and run with Docker Compose
115
+ docker-compose up --build
116
+
117
+ # Or run services individually
118
+ docker-compose up backend
119
+ docker-compose up frontend
120
+ ```
121
+
122
+ 6. **Access the Application**
123
+ - Frontend: http://localhost:3000
124
+ - Backend API: http://localhost:8000
125
+ - API Documentation: http://localhost:8000/docs
126
+
127
+ ## Usage
128
+
129
+ ### Getting Started
130
+
131
+ 1. **Upload Documents**
132
+ - Navigate to the "Documents" tab
133
+ - Drag and drop PDF files or click to select
134
+ - Wait for processing (text extraction and vector embedding)
135
+ - View upload status and document statistics
136
+
137
+ 2. **Start Chatting**
138
+ - Switch to the "Chat" tab
139
+ - Ask questions about your uploaded documents
140
+ - Get AI-powered answers with source references
141
+ - View conversation history
142
+
143
+ 3. **Document Management**
144
+ - View all uploaded documents with metadata
145
+ - Delete documents when no longer needed
146
+ - Monitor processing status and file sizes
147
+
148
+ ### Features
149
+
150
+ - **Smart Document Processing**: Automatic text extraction and chunking
151
+ - **Vector Search**: Semantic similarity search for relevant content
152
+ - **AI-Powered Q&A**: Context-aware answers using OpenAI or Claude
153
+ - **Source Citations**: See which documents and sections were referenced
154
+ - **Conversation History**: Persistent chat sessions
155
+ - **File Management**: Upload, view, and delete documents
156
+ - **Real-time Processing**: Live status updates during uploads
157
+
158
+ ### Supported File Types
159
+
160
+ - **PDF Documents**: All standard PDF files
161
+ - **Maximum Size**: 10MB per file
162
+ - **Processing**: Automatic text extraction and metadata parsing
163
+
164
+ ## API Endpoints
165
+
166
+ ### Document Management
167
+ - `POST /api/v1/documents/upload`: Upload PDF documents
168
+ - `GET /api/v1/documents/`: List all documents
169
+ - `GET /api/v1/documents/{id}`: Get specific document
170
+ - `DELETE /api/v1/documents/{id}`: Delete a document
171
+ - `GET /api/v1/documents/stats/summary`: Get document statistics
172
+
173
+ ### Chat & Q&A
174
+ - `POST /api/v1/chat/`: Send questions and get answers
175
+ - `GET /api/v1/chat/history/{session_id}`: Get chat history
176
+ - `POST /api/v1/chat/session/new`: Create new chat session
177
+ - `GET /api/v1/chat/sessions`: List all sessions
178
+ - `DELETE /api/v1/chat/session/{session_id}`: Delete session
179
+ - `GET /api/v1/chat/models/available`: Get available AI models
180
+
181
+ ### System
182
+ - `GET /health`: Health check
183
+ - `GET /docs`: Interactive API documentation (Swagger UI)
184
+ - `GET /redoc`: Alternative API documentation
185
+
186
+ ## Configuration
187
+
188
+ ### Environment Variables
189
+
190
+ **Backend (.env):**
191
+ ```env
192
+ # Required: Set at least one AI provider
193
+ OPENAI_API_KEY=your-openai-api-key
194
+ ANTHROPIC_API_KEY=your-anthropic-api-key
195
+
196
+ # Optional: Customize settings
197
+ DATABASE_URL=sqlite:///./pdf_chatbot.db
198
+ CHROMA_PERSIST_DIRECTORY=./chroma_db
199
+ UPLOAD_DIR=./uploads
200
+ MAX_FILE_SIZE=10485760
201
+ ```
202
+
203
+ **Frontend (.env):**
204
+ ```env
205
+ NEXT_PUBLIC_API_URL=http://localhost:8000
206
+ ```
207
+
208
+ ### AI Provider Setup
209
+
210
+ 1. **OpenAI**: Get API key from [OpenAI Platform](https://platform.openai.com/)
211
+ 2. **Anthropic**: Get API key from [Anthropic Console](https://console.anthropic.com/)
212
+
213
+ ## Development
214
+
215
+ ### Backend Development
216
+ ```bash
217
+ cd backend
218
+ source venv/bin/activate
219
+ uvicorn main:app --reload --port 8000
220
+ ```
221
+
222
+ ### Frontend Development
223
+ ```bash
224
+ cd frontend
225
+ npm run dev
226
+ ```
227
+
228
+ ### Testing
229
+ ```bash
230
+ # Backend tests
231
+ cd backend
232
+ pytest
233
+
234
+ # Frontend tests
235
+ cd frontend
236
+ npm test
237
+ ```
238
+
239
+ ## Troubleshooting
240
+
241
+ ### Common Issues
242
+
243
+ 1. **API Key Not Configured**
244
+ - Ensure you've added your API key to `backend/.env`
245
+ - Restart the backend server after changing environment variables
246
+
247
+ 2. **Upload Fails**
248
+ - Check file size (max 10MB)
249
+ - Ensure file is a valid PDF
250
+ - Check backend logs for detailed error messages
251
+
252
+ 3. **Chat Not Working**
253
+ - Verify AI service is configured and working
254
+ - Check if documents are properly processed
255
+ - Review browser console for frontend errors
256
+
257
+ 4. **Docker Issues**
258
+ - Ensure Docker and Docker Compose are installed
259
+ - Check if ports 3000 and 8000 are available
260
+ - Use `docker-compose logs` to view service logs
261
+
262
+ ## Contributing
263
+
264
+ 1. Fork the repository
265
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
266
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
267
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
268
+ 5. Open a Pull Request
269
+
270
+ ## License
271
+
272
+ This project is unlicensed - see the [LICENSE](LICENSE) file for details.
273
+
274
+ ## Acknowledgments
275
+
276
+ - Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/)
277
+ - Vector storage powered by [ChromaDB](https://www.trychroma.com/)
278
+ - AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/)
279
+ - UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/)
backend/Dockerfile ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ gcc \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ # Copy requirements first for better caching
11
+ COPY requirements.txt .
12
+
13
+ # Install Python dependencies
14
+ RUN pip install --no-cache-dir -r requirements.txt
15
+
16
+ # Copy application code
17
+ COPY . .
18
+
19
+ # Create necessary directories
20
+ RUN mkdir -p uploads chroma_db
21
+
22
+ # Expose port
23
+ EXPOSE 8000
24
+
25
+ # Health check
26
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
27
+ CMD curl -f http://localhost:8000/health || exit 1
28
+
29
+ # Run the application
30
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
backend/app/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # App package
backend/app/api/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # API package
backend/app/api/endpoints/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Endpoints package
backend/app/api/endpoints/chat.py ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import APIRouter, Depends, HTTPException
2
+ from sqlalchemy.orm import Session
3
+ from typing import List
4
+ import json
5
+ import uuid
6
+ from datetime import datetime
7
+
8
+ from app.core.database import get_db
9
+ from app.models.document import ChatMessage
10
+ from app.schemas.chat import ChatRequest, ChatResponse, ChatHistoryResponse, ChatMessageCreate, ChatMessageResponse
11
+ from app.services.vector_store import VectorStore
12
+ from app.services.ai_service import AIService
13
+
14
+ router = APIRouter()
15
+ vector_store = VectorStore()
16
+ ai_service = AIService()
17
+
18
+
19
+ @router.post("/", response_model=ChatResponse)
20
+ def chat_with_documents(
21
+ request: ChatRequest,
22
+ db: Session = Depends(get_db)
23
+ ):
24
+ """Send a question and get an answer based on uploaded documents"""
25
+ try:
26
+ # Check if AI service is configured
27
+ if not ai_service.is_configured():
28
+ raise HTTPException(
29
+ status_code=503,
30
+ detail="AI service not configured. Please set up OpenAI or Anthropic API keys."
31
+ )
32
+
33
+ # Search for relevant documents
34
+ print(f"Searching for documents with query: {request.question}")
35
+ context_documents = vector_store.search_similar(request.question, n_results=5, document_id=str(request.document_id) if request.document_id is not None else None)
36
+ print(f"Found {len(context_documents)} relevant documents")
37
+ if context_documents:
38
+ print(f"Document IDs found: {[doc.get('metadata', {}).get('document_id') for doc in context_documents]}")
39
+ else:
40
+ print("No documents found in vector store")
41
+ # Check vector store stats
42
+ stats = vector_store.get_collection_stats()
43
+ print(f"Vector store stats: {stats}")
44
+
45
+ if not context_documents:
46
+ # No relevant documents found
47
+ answer = "I don't have enough information in the uploaded documents to answer your question. Please upload relevant PDF documents first."
48
+
49
+ # Save user message
50
+ user_message = ChatMessage(
51
+ session_id=request.session_id,
52
+ message_type="user",
53
+ content=request.question
54
+ )
55
+ db.add(user_message)
56
+ db.commit()
57
+
58
+ # Save assistant message
59
+ assistant_message = ChatMessage(
60
+ session_id=request.session_id,
61
+ message_type="assistant",
62
+ content=answer
63
+ )
64
+ db.add(assistant_message)
65
+ db.commit()
66
+
67
+ return ChatResponse(
68
+ success=True,
69
+ answer=answer,
70
+ session_id=request.session_id,
71
+ message_id=assistant_message.id
72
+ )
73
+
74
+ # Generate answer using AI
75
+ ai_response = ai_service.generate_answer(
76
+ request.question,
77
+ context_documents,
78
+ model=request.model
79
+ )
80
+
81
+ # Save user message
82
+ user_message = ChatMessage(
83
+ session_id=request.session_id,
84
+ message_type="user",
85
+ content=request.question
86
+ )
87
+ db.add(user_message)
88
+ db.commit()
89
+
90
+ # Save assistant message with document references
91
+ document_refs = json.dumps([doc.get('metadata', {}).get('document_id') for doc in context_documents])
92
+ assistant_message = ChatMessage(
93
+ session_id=request.session_id,
94
+ message_type="assistant",
95
+ content=ai_response["answer"],
96
+ document_references=document_refs
97
+ )
98
+ db.add(assistant_message)
99
+ db.commit()
100
+
101
+ return ChatResponse(
102
+ success=ai_response["success"],
103
+ answer=ai_response["answer"],
104
+ model=ai_response.get("model"),
105
+ sources=ai_response.get("sources", []),
106
+ session_id=request.session_id,
107
+ message_id=assistant_message.id
108
+ )
109
+
110
+ except HTTPException:
111
+ raise
112
+ except Exception as e:
113
+ raise HTTPException(status_code=500, detail=f"Error processing chat request: {str(e)}")
114
+
115
+
116
+ @router.get("/history/{session_id}", response_model=ChatHistoryResponse)
117
+ def get_chat_history(
118
+ session_id: str,
119
+ skip: int = 0,
120
+ limit: int = 50,
121
+ db: Session = Depends(get_db)
122
+ ):
123
+ """Get chat history for a specific session"""
124
+ try:
125
+ messages = db.query(ChatMessage).filter(
126
+ ChatMessage.session_id == session_id
127
+ ).order_by(ChatMessage.created_at.asc()).offset(skip).limit(limit).all()
128
+
129
+ total = db.query(ChatMessage).filter(
130
+ ChatMessage.session_id == session_id
131
+ ).count()
132
+
133
+ return ChatHistoryResponse(
134
+ messages=[ChatMessageResponse.from_orm(msg) for msg in messages],
135
+ total=total
136
+ )
137
+ except Exception as e:
138
+ raise HTTPException(status_code=500, detail=f"Error retrieving chat history: {str(e)}")
139
+
140
+
141
+ @router.post("/session/new")
142
+ def create_new_session():
143
+ """Create a new chat session"""
144
+ try:
145
+ session_id = str(uuid.uuid4())
146
+ return {"session_id": session_id}
147
+ except Exception as e:
148
+ raise HTTPException(status_code=500, detail=f"Error creating session: {str(e)}")
149
+
150
+
151
+ @router.get("/sessions")
152
+ def list_sessions(db: Session = Depends(get_db)):
153
+ """List all chat sessions"""
154
+ try:
155
+ # Get unique session IDs with message counts
156
+ sessions = db.query(
157
+ ChatMessage.session_id,
158
+ db.func.count(ChatMessage.id).label('message_count'),
159
+ db.func.max(ChatMessage.created_at).label('last_message_at')
160
+ ).group_by(ChatMessage.session_id).order_by(
161
+ db.func.max(ChatMessage.created_at).desc()
162
+ ).all()
163
+
164
+ return [
165
+ {
166
+ "session_id": session.session_id,
167
+ "message_count": session.message_count,
168
+ "last_message_at": session.last_message_at
169
+ }
170
+ for session in sessions
171
+ ]
172
+ except Exception as e:
173
+ raise HTTPException(status_code=500, detail=f"Error retrieving sessions: {str(e)}")
174
+
175
+
176
+ @router.delete("/session/{session_id}")
177
+ def delete_session(session_id: str, db: Session = Depends(get_db)):
178
+ """Delete a chat session and all its messages"""
179
+ try:
180
+ messages = db.query(ChatMessage).filter(
181
+ ChatMessage.session_id == session_id
182
+ ).all()
183
+
184
+ for message in messages:
185
+ db.delete(message)
186
+
187
+ db.commit()
188
+
189
+ return {"success": True, "message": f"Session {session_id} deleted successfully"}
190
+ except Exception as e:
191
+ raise HTTPException(status_code=500, detail=f"Error deleting session: {str(e)}")
192
+
193
+
194
+ @router.get("/models/available")
195
+ def get_available_models():
196
+ """Get list of available AI models"""
197
+ try:
198
+ models = ai_service.get_available_models()
199
+ return {
200
+ "available_models": models,
201
+ "is_configured": ai_service.is_configured()
202
+ }
203
+ except Exception as e:
204
+ raise HTTPException(status_code=500, detail=f"Error retrieving models: {str(e)}")
backend/app/api/endpoints/documents.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import APIRouter, Depends, HTTPException, UploadFile, File
2
+ from sqlalchemy.orm import Session
3
+ from typing import List
4
+ import os
5
+ import uuid
6
+ import aiofiles
7
+ from datetime import datetime
8
+
9
+ from app.core.database import get_db
10
+ from app.core.config import settings
11
+ from app.models.document import Document
12
+ from app.schemas.document import DocumentResponse, DocumentListResponse, DocumentDeleteResponse, UploadResponse
13
+ from app.services.pdf_processor import PDFProcessor
14
+ from app.services.vector_store import VectorStore
15
+ from app.models.document import ChatMessage
16
+ import shutil
17
+
18
+ router = APIRouter()
19
+ pdf_processor = PDFProcessor()
20
+ vector_store = VectorStore()
21
+
22
+
23
+ @router.post("/upload", response_model=UploadResponse)
24
+ async def upload_document(
25
+ file: UploadFile = File(...),
26
+ db: Session = Depends(get_db)
27
+ ):
28
+ """Upload and process a PDF document"""
29
+ try:
30
+ # Restrict to 3 documents max
31
+ doc_count = db.query(Document).count()
32
+ if doc_count >= 3:
33
+ raise HTTPException(status_code=400, detail="You can only upload up to 3 documents.")
34
+ # Validate file type
35
+ if not file.filename.lower().endswith('.pdf'):
36
+ raise HTTPException(status_code=400, detail="Only PDF files are allowed")
37
+
38
+ # Generate unique filename
39
+ file_extension = os.path.splitext(file.filename)[1]
40
+ unique_filename = f"{uuid.uuid4()}{file_extension}"
41
+ file_path = os.path.join(settings.UPLOAD_DIR, unique_filename)
42
+
43
+ # Save file
44
+ async with aiofiles.open(file_path, 'wb') as f:
45
+ content = await file.read()
46
+ await f.write(content)
47
+
48
+ # Process PDF
49
+ success, text_content, metadata = pdf_processor.process_pdf(file_path)
50
+
51
+ if not success:
52
+ # Clean up file if processing failed
53
+ if os.path.exists(file_path):
54
+ os.remove(file_path)
55
+ raise HTTPException(status_code=400, detail=text_content)
56
+
57
+ # Create document record
58
+ db_document = Document(
59
+ filename=unique_filename,
60
+ original_filename=file.filename,
61
+ file_path=file_path,
62
+ file_size=len(content),
63
+ content=text_content,
64
+ processed=True
65
+ )
66
+
67
+ db.add(db_document)
68
+ db.commit()
69
+ db.refresh(db_document)
70
+
71
+ # Add to vector store
72
+ print(f"Adding document {db_document.id} to vector store...")
73
+ vector_success = vector_store.add_document(
74
+ str(db_document.id),
75
+ text_content,
76
+ metadata={
77
+ "filename": file.filename,
78
+ "file_size": len(content),
79
+ "num_pages": metadata.get('num_pages', 0)
80
+ }
81
+ )
82
+
83
+ if not vector_success:
84
+ # Log warning but don't fail the upload
85
+ print(f"Warning: Failed to add document {db_document.id} to vector store")
86
+ else:
87
+ print(f"Successfully added document {db_document.id} to vector store")
88
+ # Check collection stats
89
+ stats = vector_store.get_collection_stats()
90
+ print(f"Vector store stats: {stats}")
91
+
92
+ return UploadResponse(
93
+ success=True,
94
+ document=DocumentResponse.from_orm(db_document),
95
+ message="Document uploaded and processed successfully"
96
+ )
97
+
98
+ except HTTPException:
99
+ raise
100
+ except Exception as e:
101
+ # Clean up file if something went wrong
102
+ if 'file_path' in locals() and os.path.exists(file_path):
103
+ os.remove(file_path)
104
+ raise HTTPException(status_code=500, detail=f"Error uploading document: {str(e)}")
105
+
106
+
107
+ @router.get("/", response_model=DocumentListResponse)
108
+ def list_documents(
109
+ skip: int = 0,
110
+ limit: int = 100,
111
+ db: Session = Depends(get_db)
112
+ ):
113
+ """List all uploaded documents"""
114
+ try:
115
+ documents = db.query(Document).offset(skip).limit(limit).all()
116
+ total = db.query(Document).count()
117
+
118
+ return DocumentListResponse(
119
+ documents=[DocumentResponse.from_orm(doc) for doc in documents],
120
+ total=total
121
+ )
122
+ except Exception as e:
123
+ raise HTTPException(status_code=500, detail=f"Error retrieving documents: {str(e)}")
124
+
125
+
126
+ @router.get("/{document_id}", response_model=DocumentResponse)
127
+ def get_document(document_id: int, db: Session = Depends(get_db)):
128
+ """Get a specific document by ID"""
129
+ try:
130
+ document = db.query(Document).filter(Document.id == document_id).first()
131
+ if not document:
132
+ raise HTTPException(status_code=404, detail="Document not found")
133
+
134
+ return DocumentResponse.from_orm(document)
135
+ except HTTPException:
136
+ raise
137
+ except Exception as e:
138
+ raise HTTPException(status_code=500, detail=f"Error retrieving document: {str(e)}")
139
+
140
+
141
+ @router.delete("/{document_id}", response_model=DocumentDeleteResponse)
142
+ def delete_document(document_id: int, db: Session = Depends(get_db)):
143
+ """Delete a document and its vector embeddings"""
144
+ try:
145
+ document = db.query(Document).filter(Document.id == document_id).first()
146
+ if not document:
147
+ raise HTTPException(status_code=404, detail="Document not found")
148
+
149
+ # Delete from vector store
150
+ vector_store.delete_document(str(document_id))
151
+
152
+ # Delete file from filesystem
153
+ if os.path.exists(document.file_path):
154
+ os.remove(document.file_path)
155
+
156
+ # Delete from database
157
+ db.delete(document)
158
+ db.commit()
159
+
160
+ return DocumentDeleteResponse(
161
+ success=True,
162
+ message=f"Document {document.original_filename} deleted successfully"
163
+ )
164
+ except HTTPException:
165
+ raise
166
+ except Exception as e:
167
+ raise HTTPException(status_code=500, detail=f"Error deleting document: {str(e)}")
168
+
169
+
170
+ @router.post("/clear_all")
171
+ def clear_all_data(db: Session = Depends(get_db)):
172
+ """Admin endpoint to clear all documents, chat messages, uploaded files, and vector store."""
173
+ try:
174
+ # Delete all documents and chat messages from DB
175
+ db.query(Document).delete()
176
+ db.query(ChatMessage).delete()
177
+ db.commit()
178
+ # Delete all files in uploads directory
179
+ upload_dir = settings.UPLOAD_DIR
180
+ for filename in os.listdir(upload_dir):
181
+ file_path = os.path.join(upload_dir, filename)
182
+ try:
183
+ if os.path.isfile(file_path) or os.path.islink(file_path):
184
+ os.unlink(file_path)
185
+ elif os.path.isdir(file_path):
186
+ shutil.rmtree(file_path)
187
+ except Exception as e:
188
+ print(f"Failed to delete {file_path}: {e}")
189
+ # Clear ChromaDB vector store using the singleton
190
+ vector_store.clear_all()
191
+ return {"success": True, "message": "All documents, chat messages, uploads, and vectors cleared."}
192
+ except Exception as e:
193
+ return {"success": False, "message": f"Error clearing data: {str(e)}"}
194
+
195
+
196
+ @router.get("/stats/summary")
197
+ def get_document_stats(db: Session = Depends(get_db)):
198
+ """Get document statistics"""
199
+ try:
200
+ total_documents = db.query(Document).count()
201
+ processed_documents = db.query(Document).filter(Document.processed == True).count()
202
+ total_size = db.query(Document).with_entities(
203
+ db.func.sum(Document.file_size)
204
+ ).scalar() or 0
205
+
206
+ vector_stats = vector_store.get_collection_stats()
207
+
208
+ return {
209
+ "total_documents": total_documents,
210
+ "processed_documents": processed_documents,
211
+ "total_size_bytes": total_size,
212
+ "total_size_mb": round(total_size / (1024 * 1024), 2),
213
+ "vector_store_chunks": vector_stats.get("total_documents", 0)
214
+ }
215
+ except Exception as e:
216
+ raise HTTPException(status_code=500, detail=f"Error retrieving stats: {str(e)}")
backend/app/core/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Core package
backend/app/core/config.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic_settings import BaseSettings
2
+ from typing import Optional
3
+ import os
4
+
5
+
6
+ class Settings(BaseSettings):
7
+ # API Configuration
8
+ API_V1_STR: str = "/api/v1"
9
+ PROJECT_NAME: str = "PDF Q&A Chatbot"
10
+
11
+ # Security
12
+ SECRET_KEY: str = "your-secret-key-here"
13
+ ACCESS_TOKEN_EXPIRE_MINUTES: int = 60 * 24 * 8 # 8 days
14
+
15
+ # Database
16
+ DATABASE_URL: str = "sqlite:///./pdf_chatbot.db"
17
+
18
+ # Vector Database
19
+ CHROMA_PERSIST_DIRECTORY: str = "./chroma_db"
20
+
21
+ # AI Providers
22
+ OPENROUTER_API_KEY: Optional[str] = None
23
+ ANTHROPIC_API_KEY: Optional[str] = None
24
+
25
+ # File Storage
26
+ UPLOAD_DIR: str = "./uploads"
27
+ MAX_FILE_SIZE: int = 10 * 1024 * 1024 # 10MB
28
+ ALLOWED_EXTENSIONS: list = [".pdf"]
29
+
30
+ # CORS
31
+ BACKEND_CORS_ORIGINS: list = [
32
+ "http://localhost:3000",
33
+ "http://localhost:3001",
34
+ "http://127.0.0.1:3000",
35
+ "http://127.0.0.1:3001",
36
+ ]
37
+
38
+ class Config:
39
+ env_file = ".env"
40
+ case_sensitive = True
41
+
42
+
43
+ settings = Settings()
44
+
45
+ # Ensure upload directory exists
46
+ os.makedirs(settings.UPLOAD_DIR, exist_ok=True)
47
+ os.makedirs(settings.CHROMA_PERSIST_DIRECTORY, exist_ok=True)
backend/app/core/database.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from sqlalchemy import create_engine
2
+ from sqlalchemy.orm import sessionmaker, Session
3
+ from sqlalchemy.ext.declarative import declarative_base
4
+ from app.core.config import settings
5
+
6
+ engine = create_engine(
7
+ settings.DATABASE_URL,
8
+ connect_args={"check_same_thread": False} if "sqlite" in settings.DATABASE_URL else {}
9
+ )
10
+
11
+ SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
12
+
13
+ Base = declarative_base()
14
+
15
+ # Import models to ensure they are registered
16
+ from app.models.document import Document, ChatMessage
17
+
18
+
19
+ def get_db():
20
+ db = SessionLocal()
21
+ try:
22
+ yield db
23
+ finally:
24
+ db.close()
25
+
26
+
27
+ def create_tables():
28
+ """Create all tables in the database"""
29
+ Base.metadata.create_all(bind=engine)
backend/app/models/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Models package
backend/app/models/document.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from sqlalchemy import Column, Integer, String, DateTime, Text, Boolean
2
+ from sqlalchemy.sql import func
3
+ from datetime import datetime
4
+ from app.core.database import Base
5
+
6
+
7
+ class Document(Base):
8
+ __tablename__ = "documents"
9
+
10
+ id = Column(Integer, primary_key=True, index=True)
11
+ filename = Column(String(255), nullable=False)
12
+ original_filename = Column(String(255), nullable=False)
13
+ file_path = Column(String(500), nullable=False)
14
+ file_size = Column(Integer, nullable=False)
15
+ content = Column(Text, nullable=True)
16
+ processed = Column(Boolean, default=False)
17
+ created_at = Column(DateTime(timezone=True), server_default=func.now())
18
+ updated_at = Column(DateTime(timezone=True), onupdate=func.now())
19
+
20
+ def __repr__(self):
21
+ return f"<Document(id={self.id}, filename='{self.filename}')>"
22
+
23
+
24
+ class ChatMessage(Base):
25
+ __tablename__ = "chat_messages"
26
+
27
+ id = Column(Integer, primary_key=True, index=True)
28
+ session_id = Column(String(255), nullable=False, index=True)
29
+ message_type = Column(String(20), nullable=False) # 'user' or 'assistant'
30
+ content = Column(Text, nullable=False)
31
+ document_references = Column(Text, nullable=True) # JSON string of referenced documents
32
+ created_at = Column(DateTime(timezone=True), server_default=func.now())
33
+
34
+ def __repr__(self):
35
+ return f"<ChatMessage(id={self.id}, session_id='{self.session_id}', type='{self.message_type}')>"
backend/app/schemas/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Schemas package
backend/app/schemas/chat.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+ from typing import Optional, List, Dict
3
+ from datetime import datetime
4
+
5
+
6
+ class ChatMessageBase(BaseModel):
7
+ content: str
8
+ message_type: str # 'user' or 'assistant'
9
+
10
+
11
+ class ChatMessageCreate(ChatMessageBase):
12
+ session_id: str
13
+ document_references: Optional[str] = None
14
+
15
+
16
+ class ChatMessageResponse(ChatMessageBase):
17
+ id: int
18
+ session_id: str
19
+ document_references: Optional[str] = None
20
+ created_at: datetime
21
+
22
+ class Config:
23
+ from_attributes = True
24
+
25
+
26
+ class ChatRequest(BaseModel):
27
+ question: str
28
+ session_id: str
29
+ model: Optional[str] = "auto"
30
+ document_id: Optional[int] = None
31
+
32
+
33
+ class ChatResponse(BaseModel):
34
+ success: bool
35
+ answer: str
36
+ model: Optional[str] = None
37
+ sources: List[str] = []
38
+ session_id: str
39
+ message_id: Optional[int] = None
40
+
41
+
42
+ class ChatHistoryResponse(BaseModel):
43
+ messages: List[ChatMessageResponse]
44
+ total: int
45
+
46
+
47
+ class ChatSessionResponse(BaseModel):
48
+ session_id: str
49
+ message_count: int
50
+ last_message_at: Optional[datetime] = None
backend/app/schemas/document.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+ from typing import Optional, List
3
+ from datetime import datetime
4
+
5
+
6
+ class DocumentBase(BaseModel):
7
+ filename: str
8
+ original_filename: str
9
+ file_size: int
10
+
11
+
12
+ class DocumentCreate(DocumentBase):
13
+ file_path: str
14
+
15
+
16
+ class DocumentResponse(DocumentBase):
17
+ id: int
18
+ content: Optional[str] = None
19
+ processed: bool
20
+ created_at: datetime
21
+ updated_at: Optional[datetime] = None
22
+
23
+ class Config:
24
+ from_attributes = True
25
+
26
+
27
+ class DocumentListResponse(BaseModel):
28
+ documents: List[DocumentResponse]
29
+ total: int
30
+
31
+
32
+ class DocumentDeleteResponse(BaseModel):
33
+ success: bool
34
+ message: str
35
+
36
+
37
+ class UploadResponse(BaseModel):
38
+ success: bool
39
+ document: Optional[DocumentResponse] = None
40
+ message: str
backend/app/services/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Services package
backend/app/services/ai_service.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import openai
2
+ import anthropic
3
+ from typing import List, Dict, Optional, Union
4
+ import logging
5
+ from app.core.config import settings
6
+ import os
7
+ import httpx
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+
12
+ class AIService:
13
+ def __init__(self):
14
+ self.openrouter_api_key = settings.OPENROUTER_API_KEY
15
+ self.anthropic_client = None
16
+
17
+ # Initialize OpenRouter (OpenAI-compatible) client
18
+ if self.openrouter_api_key:
19
+ # Validate API key format
20
+ if not self.openrouter_api_key.startswith('sk-or-'):
21
+ logger.warning("OpenRouter API key doesn't start with 'sk-or-'. This might cause issues.")
22
+
23
+ openai.api_key = self.openrouter_api_key
24
+ openai.base_url = "https://openrouter.ai/api/v1"
25
+ os.environ["OPENAI_API_KEY"] = self.openrouter_api_key
26
+ os.environ["OPENAI_BASE_URL"] = "https://openrouter.ai/api/v1"
27
+ logger.info("OpenRouter API key configured")
28
+ else:
29
+ logger.warning("No OpenRouter API key found")
30
+
31
+ # Initialize Anthropic client
32
+ if settings.ANTHROPIC_API_KEY:
33
+ self.anthropic_client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)
34
+ logger.info("Anthropic API key configured")
35
+ else:
36
+ logger.warning("No Anthropic API key found")
37
+
38
+ def generate_answer(self, question: str, context_documents: List[Dict], model: str = "auto") -> Dict:
39
+ """Generate answer based on question and context documents"""
40
+ try:
41
+ # Prepare context from documents
42
+ context = self._prepare_context(context_documents)
43
+ # Collect unique document IDs from context_documents
44
+ used_doc_ids = list({str(doc.get('metadata', {}).get('document_id')) for doc in context_documents if doc.get('metadata', {}).get('document_id') is not None})
45
+ # Choose model based on availability
46
+ if model == "auto":
47
+ if self.openrouter_api_key:
48
+ model = "openrouter"
49
+ elif self.anthropic_client:
50
+ model = "anthropic"
51
+ else:
52
+ return {
53
+ "success": False,
54
+ "answer": "No AI service configured. Please set up OpenRouter or Anthropic API keys.",
55
+ "sources": []
56
+ }
57
+ if model == "openrouter" and self.openrouter_api_key:
58
+ ai_result = self._generate_openrouter_answer(question, context)
59
+ elif model == "anthropic" and self.anthropic_client:
60
+ ai_result = self._generate_anthropic_answer(question, context)
61
+ else:
62
+ return {
63
+ "success": False,
64
+ "answer": f"Model {model} not available or not configured.",
65
+ "sources": []
66
+ }
67
+ # Always use the actual doc IDs from context_documents for sources
68
+ ai_result["sources"] = used_doc_ids
69
+ return ai_result
70
+ except Exception as e:
71
+ logger.error(f"Error generating answer: {e}")
72
+ return {
73
+ "success": False,
74
+ "answer": f"Error generating answer: {str(e)}",
75
+ "sources": []
76
+ }
77
+
78
+ def _prepare_context(self, context_documents: List[Dict]) -> str:
79
+ """Prepare context string from document chunks"""
80
+ if not context_documents:
81
+ return ""
82
+ context_parts = []
83
+ for doc in context_documents:
84
+ content = doc.get('content', '')
85
+ metadata = doc.get('metadata', {})
86
+ similarity = doc.get('similarity_score', 0)
87
+ doc_id = metadata.get('document_id', 'unknown')
88
+ chunk_index = metadata.get('chunk_index')
89
+ # Use real document ID as the main label
90
+ doc_info = f"Document {doc_id} (Relevance: {similarity:.2f})"
91
+ if chunk_index is not None:
92
+ doc_info += f" - Chunk: {chunk_index}"
93
+ context_parts.append(f"{doc_info}:\n{content}\n")
94
+ return "\n".join(context_parts)
95
+
96
+ def _generate_openrouter_answer(self, question: str, context: str) -> Dict:
97
+ """Generate answer using OpenRouter API via HTTPX"""
98
+ try:
99
+ system_prompt = """You are a helpful AI assistant that answers questions based on provided document context.
100
+ Follow these guidelines:
101
+ 1. Only answer based on the information provided in the context
102
+ 2. If the context doesn't contain enough information to answer the question, say so
103
+ 3. Be concise but comprehensive
104
+ 4. Cite specific parts of the documents when possible
105
+ 5. If you're unsure about something, acknowledge the uncertainty
106
+ 6. Format your response clearly and professionally using proper markdown formatting:\n - Use **bold** for important points and headings\n - Use bullet points (β€’) for lists\n - Use numbered lists for step-by-step instructions\n - Use proper paragraph breaks for readability\n - Structure your response with clear sections when appropriate\n - Use blockquotes for important quotes or key information\n7. You must answer the user's question as directly as possible, using only the information in the context. Do not simply summarize the context. If the context does not contain an answer, say so."""
107
+
108
+ user_prompt = f"""Context from documents:\n{context}\n\nQuestion: {question}\n\nPlease provide a comprehensive answer based on the context above."""
109
+
110
+ headers = {
111
+ "Authorization": f"Bearer {self.openrouter_api_key}",
112
+ "HTTP-Referer": "http://localhost:3000",
113
+ "X-Title": "PDF Q&A Chatbot",
114
+ "Content-Type": "application/json"
115
+ }
116
+ payload = {
117
+ "model": "meta-llama/llama-3-70b-instruct",
118
+ "messages": [
119
+ {"role": "system", "content": system_prompt},
120
+ {"role": "user", "content": user_prompt}
121
+ ],
122
+ "max_tokens": 1000,
123
+ "temperature": 0.3
124
+ }
125
+ url = "https://openrouter.ai/api/v1/chat/completions"
126
+
127
+ logger.info(f"Making OpenRouter API request to: {url}")
128
+ logger.info(f"Request payload: {payload}")
129
+
130
+ with httpx.Client(timeout=60) as client:
131
+ resp = client.post(url, headers=headers, json=payload)
132
+
133
+ logger.info(f"OpenRouter API response status: {resp.status_code}")
134
+ logger.info(f"OpenRouter API response headers: {dict(resp.headers)}")
135
+
136
+ # Log the raw response for debugging
137
+ response_text = resp.text
138
+ logger.info(f"OpenRouter API raw response: {response_text[:500]}...")
139
+
140
+ if not response_text.strip():
141
+ logger.error("OpenRouter API returned empty response")
142
+ return {
143
+ "success": False,
144
+ "answer": "OpenRouter API returned empty response. Please check your API key and try again.",
145
+ "sources": []
146
+ }
147
+
148
+ resp.raise_for_status()
149
+
150
+ try:
151
+ data = resp.json()
152
+ logger.info(f"OpenRouter API parsed response: {data}")
153
+ except Exception as e:
154
+ logger.error(f"OpenRouter API non-JSON response: {response_text}")
155
+ logger.error(f"JSON parsing error: {e}")
156
+ return {
157
+ "success": False,
158
+ "answer": f"OpenRouter API returned invalid JSON response: {str(e)}",
159
+ "sources": []
160
+ }
161
+
162
+ if "choices" not in data or not data["choices"]:
163
+ logger.error(f"OpenRouter API response missing choices: {data}")
164
+ return {
165
+ "success": False,
166
+ "answer": "OpenRouter API response missing choices field",
167
+ "sources": []
168
+ }
169
+
170
+ answer = data["choices"][0]["message"]["content"].strip()
171
+
172
+ return {
173
+ "success": True,
174
+ "answer": answer,
175
+ "model": "openrouter/meta-llama/llama-3-70b-instruct",
176
+ "sources": [] # Always set to empty, will be overwritten by generate_answer
177
+ }
178
+ except httpx.HTTPStatusError as e:
179
+ logger.error(f"OpenRouter API HTTP error: {e.response.status_code} - {e.response.text}")
180
+ return {
181
+ "success": False,
182
+ "answer": f"OpenRouter API HTTP error: {e.response.status_code} - {e.response.text}",
183
+ "sources": []
184
+ }
185
+ except Exception as e:
186
+ logger.error(f"OpenRouter API error: {e}")
187
+ return {
188
+ "success": False,
189
+ "answer": f"Error calling OpenRouter API: {str(e)}",
190
+ "sources": []
191
+ }
192
+
193
+ def _generate_anthropic_answer(self, question: str, context: str) -> Dict:
194
+ """Generate answer using Anthropic Claude API"""
195
+ try:
196
+ if not self.anthropic_client:
197
+ return {
198
+ "success": False,
199
+ "answer": "Anthropic client not configured",
200
+ "sources": []
201
+ }
202
+
203
+ system_prompt = """You are a helpful AI assistant that answers questions based on provided document context.
204
+ Follow these guidelines:
205
+ 1. Only answer based on the information provided in the context
206
+ 2. If the context doesn't contain enough information to answer the question, say so
207
+ 3. Be concise but comprehensive
208
+ 4. Cite specific parts of the documents when possible
209
+ 5. If you're unsure about something, acknowledge the uncertainty
210
+ 6. Format your response clearly and professionally using proper markdown formatting:\n - Use **bold** for important points and headings\n - Use bullet points (β€’) for lists\n - Use numbered lists for step-by-step instructions\n - Use proper paragraph breaks for readability\n - Structure your response with clear sections when appropriate\n - Use blockquotes for important quotes or key information\n7. You must answer the user's question as directly as possible, using only the information in the context. Do not simply summarize the context. If the context does not contain an answer, say so."""
211
+
212
+ user_prompt = f"""Context from documents:\n{context}\n\nQuestion: {question}\n\nPlease provide a comprehensive answer based on the context above."""
213
+
214
+ response = self.anthropic_client.messages.create(
215
+ model="claude-3-sonnet-20240229",
216
+ max_tokens=1000,
217
+ temperature=0.3,
218
+ system=system_prompt,
219
+ messages=[
220
+ {"role": "user", "content": user_prompt}
221
+ ]
222
+ )
223
+
224
+ answer = response.content[0].text.strip()
225
+
226
+ return {
227
+ "success": True,
228
+ "answer": answer,
229
+ "model": "claude-3-sonnet-20240229",
230
+ "sources": [] # Always set to empty, will be overwritten by generate_answer
231
+ }
232
+
233
+ except Exception as e:
234
+ logger.error(f"Anthropic API error: {e}")
235
+ return {
236
+ "success": False,
237
+ "answer": f"Error calling Anthropic API: {str(e)}",
238
+ "sources": []
239
+ }
240
+
241
+ def _extract_sources_from_context(self, context: str) -> List[str]:
242
+ """Extract source information from context"""
243
+ sources = set() # Use set to avoid duplicates
244
+ lines = context.split('\n')
245
+
246
+ for line in lines:
247
+ if line.startswith('Document') and 'ID:' in line:
248
+ # Extract document ID
249
+ parts = line.split('ID:')
250
+ if len(parts) > 1:
251
+ doc_id = parts[1].strip().split()[0]
252
+ sources.add(f"Document ID: {doc_id}")
253
+
254
+ return list(sources) # Convert back to list
255
+
256
+ def get_available_models(self) -> List[str]:
257
+ """Get list of available AI models"""
258
+ models = []
259
+ if self.openrouter_api_key:
260
+ models.append("openrouter")
261
+ if self.anthropic_client:
262
+ models.append("anthropic")
263
+ return models
264
+
265
+ def is_configured(self) -> bool:
266
+ """Check if any AI service is configured"""
267
+ return bool(self.openrouter_api_key or self.anthropic_client)
backend/app/services/pdf_processor.py ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import PyPDF2
2
+ import os
3
+ from typing import Optional, Tuple
4
+ from app.core.config import settings
5
+ import logging
6
+
7
+ logger = logging.getLogger(__name__)
8
+
9
+
10
+ class PDFProcessor:
11
+ def __init__(self):
12
+ self.allowed_extensions = settings.ALLOWED_EXTENSIONS
13
+ self.max_file_size = settings.MAX_FILE_SIZE
14
+
15
+ def validate_file(self, file_path: str) -> Tuple[bool, str]:
16
+ """Validate uploaded file"""
17
+ if not os.path.exists(file_path):
18
+ return False, "File does not exist"
19
+
20
+ # Check file size
21
+ file_size = os.path.getsize(file_path)
22
+ if file_size > self.max_file_size:
23
+ return False, f"File size exceeds maximum allowed size of {self.max_file_size} bytes"
24
+
25
+ # Check file extension
26
+ file_ext = os.path.splitext(file_path)[1].lower()
27
+ if file_ext not in self.allowed_extensions:
28
+ return False, f"File type not allowed. Allowed types: {', '.join(self.allowed_extensions)}"
29
+
30
+ return True, "File is valid"
31
+
32
+ def extract_text(self, file_path: str) -> Optional[str]:
33
+ """Extract text content from PDF file"""
34
+ try:
35
+ with open(file_path, 'rb') as file:
36
+ pdf_reader = PyPDF2.PdfReader(file)
37
+ text_content = []
38
+
39
+ for page_num in range(len(pdf_reader.pages)):
40
+ try:
41
+ page = pdf_reader.pages[page_num]
42
+ text = page.extract_text()
43
+ if text.strip():
44
+ text_content.append(f"Page {page_num + 1}:\n{text.strip()}")
45
+ except Exception as e:
46
+ logger.warning(f"Error extracting text from page {page_num + 1}: {e}")
47
+ continue
48
+
49
+ return "\n\n".join(text_content)
50
+
51
+ except Exception as e:
52
+ logger.error(f"Error processing PDF file {file_path}: {e}")
53
+ return None
54
+
55
+ def get_metadata(self, file_path: str) -> dict:
56
+ """Extract metadata from PDF file"""
57
+ try:
58
+ with open(file_path, 'rb') as file:
59
+ pdf_reader = PyPDF2.PdfReader(file)
60
+ metadata = {
61
+ 'num_pages': len(pdf_reader.pages),
62
+ 'file_size': os.path.getsize(file_path),
63
+ 'title': None,
64
+ 'author': None,
65
+ 'subject': None,
66
+ 'creator': None
67
+ }
68
+
69
+ if pdf_reader.metadata:
70
+ metadata.update({
71
+ 'title': pdf_reader.metadata.get('/Title'),
72
+ 'author': pdf_reader.metadata.get('/Author'),
73
+ 'subject': pdf_reader.metadata.get('/Subject'),
74
+ 'creator': pdf_reader.metadata.get('/Creator')
75
+ })
76
+
77
+ return metadata
78
+
79
+ except Exception as e:
80
+ logger.error(f"Error extracting metadata from PDF file {file_path}: {e}")
81
+ return {
82
+ 'num_pages': 0,
83
+ 'file_size': os.path.getsize(file_path) if os.path.exists(file_path) else 0,
84
+ 'title': None,
85
+ 'author': None,
86
+ 'subject': None,
87
+ 'creator': None
88
+ }
89
+
90
+ def process_pdf(self, file_path: str) -> Tuple[bool, str, dict]:
91
+ """Process PDF file and return text content and metadata"""
92
+ # Validate file
93
+ is_valid, error_message = self.validate_file(file_path)
94
+ if not is_valid:
95
+ return False, error_message, {}
96
+
97
+ # Extract text
98
+ text_content = self.extract_text(file_path)
99
+ if text_content is None:
100
+ return False, "Failed to extract text from PDF", {}
101
+
102
+ # Get metadata
103
+ metadata = self.get_metadata(file_path)
104
+
105
+ return True, text_content, metadata
backend/app/services/vector_store.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import chromadb
2
+ from chromadb.config import Settings as ChromaSettings
3
+ from typing import List, Dict, Optional, Tuple
4
+ import json
5
+ import logging
6
+ from app.core.config import settings
7
+
8
+ logger = logging.getLogger(__name__)
9
+
10
+
11
+ class VectorStore:
12
+ _instance = None
13
+
14
+ def __new__(cls):
15
+ if cls._instance is None:
16
+ cls._instance = super(VectorStore, cls).__new__(cls)
17
+ cls._instance._initialized = False
18
+ return cls._instance
19
+
20
+ def __init__(self):
21
+ if not self._initialized:
22
+ self.client = chromadb.PersistentClient(
23
+ path=settings.CHROMA_PERSIST_DIRECTORY,
24
+ settings=ChromaSettings(
25
+ anonymized_telemetry=False
26
+ )
27
+ )
28
+ self.collection_name = "pdf_documents"
29
+ self.collection = self._get_or_create_collection()
30
+ self._initialized = True
31
+
32
+ def _get_or_create_collection(self):
33
+ """Get existing collection or create new one"""
34
+ try:
35
+ collection = self.client.get_collection(name=self.collection_name)
36
+ logger.info(f"Using existing collection: {self.collection_name}")
37
+ except Exception:
38
+ collection = self.client.create_collection(
39
+ name=self.collection_name,
40
+ metadata={"description": "PDF document embeddings for Q&A chatbot"}
41
+ )
42
+ logger.info(f"Created new collection: {self.collection_name}")
43
+
44
+ return collection
45
+
46
+ def add_document(self, document_id: str, content: str, metadata: Dict = None) -> bool:
47
+ """Add document content to vector store"""
48
+ try:
49
+ logger.info(f"Starting to add document {document_id} to vector store")
50
+ logger.info(f"Content length: {len(content)} characters")
51
+
52
+ # Split content into chunks for better retrieval
53
+ chunks = self._split_text(content, chunk_size=1000, overlap=200)
54
+ logger.info(f"Split content into {len(chunks)} chunks")
55
+
56
+ # Prepare data for ChromaDB
57
+ ids = [f"{document_id}_chunk_{i}" for i in range(len(chunks))]
58
+ documents = chunks
59
+ metadatas = [{
60
+ "document_id": document_id,
61
+ "chunk_index": i,
62
+ **(metadata or {})
63
+ } for i in range(len(chunks))]
64
+
65
+ logger.info(f"Prepared {len(ids)} chunks with IDs: {ids[:3]}...") # Log first 3 IDs
66
+
67
+ # Add to collection
68
+ logger.info(f"Adding chunks to ChromaDB collection: {self.collection_name}")
69
+ self.collection.add(
70
+ ids=ids,
71
+ documents=documents,
72
+ metadatas=metadatas
73
+ )
74
+
75
+ logger.info(f"Successfully added document {document_id} with {len(chunks)} chunks to vector store")
76
+ return True
77
+
78
+ except Exception as e:
79
+ logger.error(f"Error adding document {document_id} to vector store: {e}")
80
+ logger.error(f"Exception type: {type(e).__name__}")
81
+ import traceback
82
+ logger.error(f"Full traceback: {traceback.format_exc()}")
83
+ return False
84
+
85
+ def search_similar(self, query: str, n_results: int = 5, document_id: str = None) -> List[Dict]:
86
+ """Search for similar documents based on query, optionally filtering by document_id"""
87
+ try:
88
+ results = self.collection.query(
89
+ query_texts=[query],
90
+ n_results=n_results,
91
+ include=["documents", "metadatas", "distances"]
92
+ )
93
+
94
+ # Format results
95
+ formatted_results = []
96
+ if results['documents'] and results['documents'][0]:
97
+ for i, (doc, metadata, distance) in enumerate(zip(
98
+ results['documents'][0],
99
+ results['metadatas'][0],
100
+ results['distances'][0]
101
+ )):
102
+ if document_id is not None and str(metadata.get('document_id')) != str(document_id):
103
+ continue
104
+ formatted_results.append({
105
+ 'content': doc,
106
+ 'metadata': metadata,
107
+ 'similarity_score': 1 - distance, # Convert distance to similarity
108
+ 'rank': i + 1
109
+ })
110
+ return formatted_results
111
+ except Exception as e:
112
+ logger.error(f"Error searching vector store: {e}")
113
+ return []
114
+
115
+ def delete_document(self, document_id: str) -> bool:
116
+ """Delete all chunks for a specific document"""
117
+ try:
118
+ # Get all chunks for this document
119
+ results = self.collection.get(
120
+ where={"document_id": document_id}
121
+ )
122
+
123
+ if results['ids']:
124
+ self.collection.delete(ids=results['ids'])
125
+ logger.info(f"Deleted {len(results['ids'])} chunks for document {document_id}")
126
+
127
+ return True
128
+
129
+ except Exception as e:
130
+ logger.error(f"Error deleting document {document_id} from vector store: {e}")
131
+ return False
132
+
133
+ def get_collection_stats(self) -> Dict:
134
+ """Get statistics about the vector store collection"""
135
+ try:
136
+ logger.info(f"Getting stats for collection: {self.collection_name}")
137
+ count = self.collection.count()
138
+ logger.info(f"Collection count: {count}")
139
+ return {
140
+ "total_documents": count,
141
+ "collection_name": self.collection_name
142
+ }
143
+ except Exception as e:
144
+ logger.error(f"Error getting collection stats: {e}")
145
+ logger.error(f"Exception type: {type(e).__name__}")
146
+ import traceback
147
+ logger.error(f"Full traceback: {traceback.format_exc()}")
148
+ return {"total_documents": 0, "collection_name": self.collection_name}
149
+
150
+ def _split_text(self, text: str, chunk_size: int = 1000, overlap: int = 200) -> List[str]:
151
+ """Split text into overlapping chunks"""
152
+ if len(text) <= chunk_size:
153
+ return [text]
154
+
155
+ chunks = []
156
+ start = 0
157
+
158
+ while start < len(text):
159
+ end = start + chunk_size
160
+
161
+ # If this isn't the last chunk, try to break at a sentence boundary
162
+ if end < len(text):
163
+ # Look for sentence endings
164
+ for i in range(end, max(start + chunk_size - 100, start), -1):
165
+ if text[i] in '.!?':
166
+ end = i + 1
167
+ break
168
+
169
+ chunk = text[start:end].strip()
170
+ if chunk:
171
+ chunks.append(chunk)
172
+
173
+ # Move start position with overlap
174
+ start = end - overlap
175
+ if start >= len(text):
176
+ break
177
+
178
+ return chunks
179
+
180
+ def clear_all(self) -> bool:
181
+ """Clear all documents from the vector store"""
182
+ try:
183
+ self.client.delete_collection(name=self.collection_name)
184
+ self.collection = self._get_or_create_collection()
185
+ logger.info("Cleared all documents from vector store")
186
+ return True
187
+ except Exception as e:
188
+ logger.error(f"Error clearing vector store: {e}")
189
+ return False
190
+
191
+ @classmethod
192
+ def reset_instance(cls):
193
+ """Reset the singleton instance - useful after clearing collections"""
194
+ cls._instance = None
backend/main.py ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI
2
+ from fastapi.middleware.cors import CORSMiddleware
3
+ from fastapi.staticfiles import StaticFiles
4
+ import logging
5
+ import os
6
+
7
+ from app.core.config import settings
8
+ from app.core.database import create_tables, SessionLocal
9
+ from app.models.document import Document, ChatMessage
10
+ import shutil
11
+ from app.api.endpoints import documents, chat
12
+ from app.services.vector_store import VectorStore
13
+
14
+ # Configure logging
15
+ logging.basicConfig(
16
+ level=logging.INFO,
17
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
18
+ )
19
+
20
+ # Create FastAPI app
21
+ app = FastAPI(
22
+ title=settings.PROJECT_NAME,
23
+ description="A comprehensive PDF-based Q&A chatbot system",
24
+ version="1.0.0",
25
+ docs_url="/docs",
26
+ redoc_url="/redoc"
27
+ )
28
+
29
+ # Add CORS middleware
30
+ app.add_middleware(
31
+ CORSMiddleware,
32
+ allow_origins=settings.BACKEND_CORS_ORIGINS,
33
+ allow_credentials=True,
34
+ allow_methods=["*"],
35
+ allow_headers=["*"],
36
+ )
37
+
38
+ # Include API routes
39
+ app.include_router(
40
+ documents.router,
41
+ prefix=f"{settings.API_V1_STR}/documents",
42
+ tags=["documents"]
43
+ )
44
+
45
+ app.include_router(
46
+ chat.router,
47
+ prefix=f"{settings.API_V1_STR}/chat",
48
+ tags=["chat"]
49
+ )
50
+
51
+ # Health check endpoint
52
+ @app.get("/health")
53
+ def health_check():
54
+ """Health check endpoint"""
55
+ return {
56
+ "status": "healthy",
57
+ "service": settings.PROJECT_NAME,
58
+ "version": "1.0.0"
59
+ }
60
+
61
+ # Root endpoint
62
+ @app.get("/")
63
+ def root():
64
+ """Root endpoint with API information"""
65
+ return {
66
+ "message": "PDF Q&A Chatbot API",
67
+ "version": "1.0.0",
68
+ "docs": "/docs",
69
+ "health": "/health"
70
+ }
71
+
72
+ # Startup event
73
+ @app.on_event("startup")
74
+ async def startup_event():
75
+ """Initialize application on startup"""
76
+ # Create database tables
77
+ create_tables()
78
+
79
+ # Ensure directories exist
80
+ os.makedirs(settings.UPLOAD_DIR, exist_ok=True)
81
+ os.makedirs(settings.CHROMA_PERSIST_DIRECTORY, exist_ok=True)
82
+
83
+ # --- ERASE ALL DOCUMENTS, CHAT MESSAGES, AND VECTORS ON STARTUP ---
84
+ # 1. Delete all rows from documents and chat_messages tables
85
+ db = SessionLocal()
86
+ try:
87
+ db.query(Document).delete()
88
+ db.query(ChatMessage).delete()
89
+ db.commit()
90
+ finally:
91
+ db.close()
92
+ # 2. Remove all files in chroma_db directory (but keep the directory)
93
+ chroma_dir = settings.CHROMA_PERSIST_DIRECTORY
94
+ for filename in os.listdir(chroma_dir):
95
+ file_path = os.path.join(chroma_dir, filename)
96
+ try:
97
+ if os.path.isfile(file_path) or os.path.islink(file_path):
98
+ os.unlink(file_path)
99
+ elif os.path.isdir(file_path):
100
+ shutil.rmtree(file_path)
101
+ except Exception as e:
102
+ logging.warning(f"Failed to delete {file_path}: {e}")
103
+ # 3. Explicitly clear ChromaDB vector store
104
+ vector_store = VectorStore()
105
+ vector_store.clear_all()
106
+ logging.info("All documents, chat messages, and vector store erased on startup.")
107
+ # --- END ERASE ---
108
+
109
+ logging.info("Application started successfully")
110
+
111
+ # Shutdown event
112
+ @app.on_event("shutdown")
113
+ async def shutdown_event():
114
+ """Cleanup on application shutdown"""
115
+ logging.info("Application shutting down")
116
+
117
+ if __name__ == "__main__":
118
+ import uvicorn
119
+ uvicorn.run(
120
+ "main:app",
121
+ host="0.0.0.0",
122
+ port=8000,
123
+ reload=True,
124
+ log_level="info"
125
+ )
backend/requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn[standard]==0.24.0
3
+ python-multipart==0.0.6
4
+ pydantic==2.5.0
5
+ pydantic-settings==2.1.0
6
+ sqlalchemy==2.0.23
7
+ alembic==1.13.0
8
+ chromadb==0.4.18
9
+ openai==1.3.7
10
+ anthropic==0.7.8
11
+ pypdf2==3.0.1
12
+ python-dotenv==1.0.0
13
+ aiofiles==23.2.1
14
+ python-jose[cryptography]==3.3.0
15
+ passlib[bcrypt]==1.7.4
16
+ python-multipart==0.0.6
17
+ httpx==0.25.2
18
+ pytest==7.4.3
19
+ pytest-asyncio==0.21.1
backend/test_openrouter.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script to verify OpenRouter API connectivity
4
+ """
5
+
6
+ import os
7
+ import httpx
8
+ import json
9
+ from dotenv import load_dotenv
10
+
11
+ # Load environment variables
12
+ load_dotenv()
13
+
14
+ def test_openrouter_api():
15
+ """Test OpenRouter API connection"""
16
+ api_key = os.getenv("OPENROUTER_API_KEY")
17
+
18
+ if not api_key:
19
+ print("❌ No OpenRouter API key found in environment variables")
20
+ return False
21
+
22
+ print(f"πŸ”‘ API Key found: {api_key[:10]}...{api_key[-4:]}")
23
+
24
+ # Test API endpoint
25
+ url = "https://openrouter.ai/api/v1/chat/completions"
26
+
27
+ headers = {
28
+ "Authorization": f"Bearer {api_key}",
29
+ "HTTP-Referer": "http://localhost:3000",
30
+ "X-Title": "PDF Q&A Chatbot Test",
31
+ "Content-Type": "application/json"
32
+ }
33
+
34
+ payload = {
35
+ "model": "meta-llama/llama-3-70b-instruct",
36
+ "messages": [
37
+ {"role": "user", "content": "Hello! Please respond with 'API test successful' if you can see this message."}
38
+ ],
39
+ "max_tokens": 50,
40
+ "temperature": 0.1
41
+ }
42
+
43
+ print(f"🌐 Making request to: {url}")
44
+ print(f"πŸ“€ Request payload: {json.dumps(payload, indent=2)}")
45
+
46
+ try:
47
+ with httpx.Client(timeout=30) as client:
48
+ response = client.post(url, headers=headers, json=payload)
49
+
50
+ print(f"πŸ“₯ Response status: {response.status_code}")
51
+ print(f"πŸ“₯ Response headers: {dict(response.headers)}")
52
+
53
+ # Log raw response
54
+ response_text = response.text
55
+ print(f"πŸ“₯ Raw response: {response_text}")
56
+
57
+ if not response_text.strip():
58
+ print("❌ Empty response received")
59
+ return False
60
+
61
+ if response.status_code != 200:
62
+ print(f"❌ HTTP error: {response.status_code}")
63
+ return False
64
+
65
+ # Try to parse JSON
66
+ try:
67
+ data = response.json()
68
+ print(f"βœ… JSON parsed successfully: {json.dumps(data, indent=2)}")
69
+
70
+ if "choices" in data and data["choices"]:
71
+ answer = data["choices"][0]["message"]["content"]
72
+ print(f"πŸ€– AI Response: {answer}")
73
+ return True
74
+ else:
75
+ print("❌ No choices in response")
76
+ return False
77
+
78
+ except json.JSONDecodeError as e:
79
+ print(f"❌ JSON parsing failed: {e}")
80
+ return False
81
+
82
+ except Exception as e:
83
+ print(f"❌ Request failed: {e}")
84
+ return False
85
+
86
+ if __name__ == "__main__":
87
+ print("πŸ§ͺ Testing OpenRouter API connectivity...")
88
+ success = test_openrouter_api()
89
+
90
+ if success:
91
+ print("βœ… OpenRouter API test successful!")
92
+ else:
93
+ print("❌ OpenRouter API test failed!")
docker-compose.yml ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ backend:
5
+ build:
6
+ context: ./backend
7
+ dockerfile: Dockerfile
8
+ ports:
9
+ - "8000:8000"
10
+ environment:
11
+ - DATABASE_URL=sqlite:///./pdf_chatbot.db
12
+ - CHROMA_PERSIST_DIRECTORY=./chroma_db
13
+ - UPLOAD_DIR=./uploads
14
+ - MAX_FILE_SIZE=10485760
15
+ - ALLOWED_EXTENSIONS=[".pdf"]
16
+ - BACKEND_CORS_ORIGINS=["http://localhost:3000","http://localhost:3001","http://127.0.0.1:3000","http://127.0.0.1:3001"]
17
+ env_file:
18
+ - ./backend/.env
19
+ volumes:
20
+ - ./backend/uploads:/app/uploads
21
+ - ./backend/chroma_db:/app/chroma_db
22
+ - ./backend/pdf_chatbot.db:/app/pdf_chatbot.db
23
+ restart: unless-stopped
24
+
25
+ frontend:
26
+ build:
27
+ context: ./frontend
28
+ dockerfile: Dockerfile
29
+ ports:
30
+ - "3000:3000"
31
+ environment:
32
+ - NEXT_PUBLIC_API_URL=http://localhost:8000
33
+ env_file:
34
+ - ./frontend/.env
35
+ depends_on:
36
+ - backend
37
+ restart: unless-stopped
38
+
39
+ volumes:
40
+ uploads:
41
+ chroma_db:
42
+ database:
frontend/Dockerfile ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM node:18-alpine
2
+
3
+ WORKDIR /app
4
+
5
+ # Copy package files
6
+ COPY package*.json ./
7
+
8
+ # Install dependencies
9
+ RUN npm ci --only=production
10
+
11
+ # Copy application code
12
+ COPY . .
13
+
14
+ # Build the application
15
+ RUN npm run build
16
+
17
+ # Expose port
18
+ EXPOSE 3000
19
+
20
+ # Health check
21
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
22
+ CMD curl -f http://localhost:3000 || exit 1
23
+
24
+ # Run the application
25
+ CMD ["npm", "start"]
frontend/app/globals.css ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @tailwind base;
2
+ @tailwind components;
3
+ @tailwind utilities;
4
+
5
+ @layer base {
6
+ :root {
7
+ --background: 0 0% 100%;
8
+ --foreground: 222.2 84% 4.9%;
9
+ --card: 0 0% 100%;
10
+ --card-foreground: 222.2 84% 4.9%;
11
+ --popover: 0 0% 100%;
12
+ --popover-foreground: 222.2 84% 4.9%;
13
+ --primary: 221.2 83.2% 53.3%;
14
+ --primary-foreground: 210 40% 98%;
15
+ --secondary: 210 40% 96%;
16
+ --secondary-foreground: 222.2 84% 4.9%;
17
+ --muted: 210 40% 96%;
18
+ --muted-foreground: 215.4 16.3% 46.9%;
19
+ --accent: 210 40% 96%;
20
+ --accent-foreground: 222.2 84% 4.9%;
21
+ --destructive: 0 84.2% 60.2%;
22
+ --destructive-foreground: 210 40% 98%;
23
+ --border: 214.3 31.8% 91.4%;
24
+ --input: 214.3 31.8% 91.4%;
25
+ --ring: 221.2 83.2% 53.3%;
26
+ --radius: 0.5rem;
27
+ }
28
+
29
+ .dark {
30
+ --background: 222.2 84% 4.9%;
31
+ --foreground: 210 40% 98%;
32
+ --card: 222.2 84% 4.9%;
33
+ --card-foreground: 210 40% 98%;
34
+ --popover: 222.2 84% 4.9%;
35
+ --popover-foreground: 210 40% 98%;
36
+ --primary: 217.2 91.2% 59.8%;
37
+ --primary-foreground: 222.2 84% 4.9%;
38
+ --secondary: 217.2 32.6% 17.5%;
39
+ --secondary-foreground: 210 40% 98%;
40
+ --muted: 217.2 32.6% 17.5%;
41
+ --muted-foreground: 215 20.2% 65.1%;
42
+ --accent: 217.2 32.6% 17.5%;
43
+ --accent-foreground: 210 40% 98%;
44
+ --destructive: 0 62.8% 30.6%;
45
+ --destructive-foreground: 210 40% 98%;
46
+ --border: 217.2 32.6% 17.5%;
47
+ --input: 217.2 32.6% 17.5%;
48
+ --ring: 224.3 76.3% 94.1%;
49
+ }
50
+ }
51
+
52
+ @layer base {
53
+ * {
54
+ @apply border-border;
55
+ }
56
+ body {
57
+ @apply bg-background text-foreground;
58
+ }
59
+ }
60
+
61
+ /* Custom scrollbar */
62
+ ::-webkit-scrollbar {
63
+ width: 6px;
64
+ }
65
+
66
+ ::-webkit-scrollbar-track {
67
+ background: hsl(var(--muted));
68
+ }
69
+
70
+ ::-webkit-scrollbar-thumb {
71
+ background: hsl(var(--muted-foreground));
72
+ border-radius: 3px;
73
+ }
74
+
75
+ ::-webkit-scrollbar-thumb:hover {
76
+ background: hsl(var(--foreground));
77
+ }
78
+
79
+ /* Chat message animations */
80
+ @keyframes slideIn {
81
+ from {
82
+ opacity: 0;
83
+ transform: translateY(10px);
84
+ }
85
+ to {
86
+ opacity: 1;
87
+ transform: translateY(0);
88
+ }
89
+ }
90
+
91
+ .chat-message {
92
+ animation: slideIn 0.3s ease-out;
93
+ }
94
+
95
+ /* Loading animation */
96
+ @keyframes pulse {
97
+ 0%, 100% {
98
+ opacity: 1;
99
+ }
100
+ 50% {
101
+ opacity: 0.5;
102
+ }
103
+ }
104
+
105
+ .animate-pulse {
106
+ animation: pulse 2s cubic-bezier(0.4, 0, 0.6, 1) infinite;
107
+ }
frontend/app/layout.tsx ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { Metadata } from 'next'
2
+ import { Inter } from 'next/font/google'
3
+ import './globals.css'
4
+
5
+ const inter = Inter({ subsets: ['latin'] })
6
+
7
+ export const metadata: Metadata = {
8
+ title: 'PDF Q&A Chatbot',
9
+ description: 'A comprehensive PDF-based Q&A chatbot system',
10
+ }
11
+
12
+ export default function RootLayout({
13
+ children,
14
+ }: {
15
+ children: React.ReactNode
16
+ }) {
17
+ return (
18
+ <html lang="en">
19
+ <body className={inter.className}>
20
+ <div className="min-h-screen bg-background">
21
+ {children}
22
+ </div>
23
+ </body>
24
+ </html>
25
+ )
26
+ }
frontend/app/page.tsx ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'use client'
2
+
3
+ import { useState, useEffect, useCallback } from 'react'
4
+ import { Upload, MessageCircle, FileText, Settings, Send, Trash2 } from 'lucide-react'
5
+ import ChatInterface from '@/components/ChatInterface'
6
+ import DocumentUpload from '@/components/DocumentUpload'
7
+ import DocumentList from '@/components/DocumentList'
8
+ import { useChatStore } from '@/lib/store'
9
+ import { apiService } from '@/lib/api'
10
+
11
+ export default function Home() {
12
+ const [activeTab, setActiveTab] = useState<'chat' | 'documents'>('documents')
13
+ const { sessionId, createNewSession } = useChatStore()
14
+ const [documentCount, setDocumentCount] = useState(0)
15
+
16
+ const fetchDocumentCount = useCallback(async () => {
17
+ try {
18
+ const response = await apiService.getDocuments()
19
+ setDocumentCount(response.documents.length)
20
+ } catch (e) {
21
+ setDocumentCount(0)
22
+ }
23
+ }, [])
24
+
25
+ useEffect(() => {
26
+ if (!sessionId) {
27
+ createNewSession()
28
+ }
29
+ }, [sessionId, createNewSession])
30
+
31
+ useEffect(() => {
32
+ fetchDocumentCount()
33
+ }, [activeTab, fetchDocumentCount])
34
+
35
+ useEffect(() => {
36
+ // Clear all data on every page load
37
+ fetch('/api/v1/documents/clear_all', { method: 'POST' })
38
+ .then(() => fetchDocumentCount())
39
+ .catch(() => fetchDocumentCount())
40
+ }, [])
41
+
42
+ return (
43
+ <div className="flex h-screen bg-gray-50">
44
+ {/* Sidebar */}
45
+ <div className="w-64 bg-white border-r border-gray-200 flex flex-col">
46
+ <div className="p-6 border-b border-gray-200">
47
+ <h1 className="text-xl font-bold text-gray-900">PDF Q&A Chatbot</h1>
48
+ <p className="text-sm text-gray-600 mt-1">Upload and chat with your documents</p>
49
+ </div>
50
+
51
+ <nav className="flex-1 p-4">
52
+ <div className="space-y-2">
53
+ <button
54
+ onClick={() => setActiveTab('documents')}
55
+ className={`w-full flex items-center px-3 py-2 text-sm font-medium rounded-md transition-colors ${
56
+ activeTab === 'documents'
57
+ ? 'bg-blue-100 text-blue-700'
58
+ : 'text-gray-600 hover:bg-gray-100'
59
+ }`}
60
+ >
61
+ <FileText className="w-4 h-4 mr-3" />
62
+ Documents
63
+ </button>
64
+ <button
65
+ onClick={() => setActiveTab('chat')}
66
+ className={`w-full flex items-center px-3 py-2 text-sm font-medium rounded-md transition-colors ${
67
+ activeTab === 'chat'
68
+ ? 'bg-blue-100 text-blue-700'
69
+ : 'text-gray-600 hover:bg-gray-100'
70
+ }`}
71
+ >
72
+ <MessageCircle className="w-4 h-4 mr-3" />
73
+ Chat
74
+ </button>
75
+ </div>
76
+ </nav>
77
+ {/* Removed Upload PDF button from sidebar */}
78
+ </div>
79
+
80
+ {/* Main Content */}
81
+ <div className="flex-1 flex flex-col">
82
+ {activeTab === 'chat' && (
83
+ <div className="flex-1 flex flex-col">
84
+ <div className="p-6 border-b border-gray-200">
85
+ <h2 className="text-lg font-semibold text-gray-900">Chat with Documents</h2>
86
+ <p className="text-sm text-gray-600 mt-1">
87
+ Ask questions about your uploaded PDF documents
88
+ </p>
89
+ </div>
90
+ <div className="flex-1 overflow-hidden">
91
+ <ChatInterface />
92
+ </div>
93
+ </div>
94
+ )}
95
+
96
+ {activeTab === 'documents' && (
97
+ <div className="flex-1 flex flex-col">
98
+ <div className="p-6 border-b border-gray-200">
99
+ <h2 className="text-lg font-semibold text-gray-900">Document Management</h2>
100
+ <p className="text-sm text-gray-600 mt-1">
101
+ Upload, view, and manage your PDF documents
102
+ </p>
103
+ </div>
104
+ <div className="flex-1 overflow-auto">
105
+ <div className="p-6">
106
+ <DocumentUpload disabled={documentCount >= 3} onDocumentChange={fetchDocumentCount} />
107
+ <div className="mt-8">
108
+ <DocumentList onDocumentChange={fetchDocumentCount} />
109
+ </div>
110
+ </div>
111
+ </div>
112
+ </div>
113
+ )}
114
+ {/* Add fixed Upload PDF button at bottom left */}
115
+ <div className="fixed bottom-6 left-6 z-50">
116
+ <button
117
+ onClick={() => setActiveTab('documents')}
118
+ className={`w-48 flex items-center justify-center px-3 py-2 text-sm font-medium text-white bg-blue-600 rounded-md transition-colors shadow-lg ${documentCount >= 3 ? 'opacity-60 cursor-not-allowed' : ''}`}
119
+ disabled={documentCount >= 3}
120
+ >
121
+ <Upload className="w-4 h-4 mr-2" />
122
+ Upload PDF
123
+ </button>
124
+ </div>
125
+ </div>
126
+ </div>
127
+ )
128
+ }
frontend/components/ChatInterface.tsx ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'use client'
2
+
3
+ import { useState, useRef, useEffect } from 'react'
4
+ import { Send, Bot, User, Loader2 } from 'lucide-react'
5
+ import { useChatStore, ChatMessage } from '@/lib/store'
6
+ import { apiService } from '@/lib/api'
7
+ import ReactMarkdown from 'react-markdown'
8
+ import remarkGfm from 'remark-gfm'
9
+
10
+ export default function ChatInterface() {
11
+ const [input, setInput] = useState('')
12
+ const [isLoading, setIsLoading] = useState(false)
13
+ const messagesEndRef = useRef<HTMLDivElement>(null)
14
+ const { sessionId, messages, addMessage, setLoading } = useChatStore()
15
+
16
+ const scrollToBottom = () => {
17
+ messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' })
18
+ }
19
+
20
+ useEffect(() => {
21
+ scrollToBottom()
22
+ }, [messages])
23
+
24
+ const handleSubmit = async (e: React.FormEvent) => {
25
+ e.preventDefault()
26
+ if (!input.trim() || !sessionId || isLoading) return
27
+
28
+ const userMessage = input.trim()
29
+ setInput('')
30
+ setIsLoading(true)
31
+ setLoading(true)
32
+
33
+ // Add user message
34
+ addMessage({
35
+ content: userMessage,
36
+ type: 'user',
37
+ })
38
+
39
+ try {
40
+ const response = await apiService.sendMessage({
41
+ question: userMessage,
42
+ session_id: sessionId,
43
+ })
44
+
45
+ // Add assistant message
46
+ addMessage({
47
+ content: response.answer,
48
+ type: 'assistant',
49
+ sources: response.sources,
50
+ })
51
+ } catch (error) {
52
+ console.error('Error sending message:', error)
53
+ addMessage({
54
+ content: 'Sorry, I encountered an error while processing your request. Please try again.',
55
+ type: 'assistant',
56
+ })
57
+ } finally {
58
+ setIsLoading(false)
59
+ setLoading(false)
60
+ }
61
+ }
62
+
63
+ const formatTimestamp = (timestamp: Date) => {
64
+ return timestamp.toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })
65
+ }
66
+
67
+ return (
68
+ <div className="flex flex-col h-full">
69
+ {/* Messages */}
70
+ <div className="flex-1 overflow-y-auto p-4 space-y-4">
71
+ {messages.length === 0 ? (
72
+ <div className="flex items-center justify-center h-full text-gray-500">
73
+ <div className="text-center">
74
+ <Bot className="w-12 h-12 mx-auto mb-4 text-gray-300" />
75
+ <p className="text-lg font-medium">Start a conversation</p>
76
+ <p className="text-sm">Ask questions about your uploaded documents</p>
77
+ </div>
78
+ </div>
79
+ ) : (
80
+ messages.map((message) => (
81
+ <div
82
+ key={message.id}
83
+ className={`flex ${message.type === 'user' ? 'justify-end' : 'justify-start'}`}
84
+ >
85
+ <div
86
+ className={`max-w-[75%] rounded-xl px-4 py-3 shadow-sm ${
87
+ message.type === 'user'
88
+ ? 'bg-blue-600 text-white'
89
+ : 'bg-white text-gray-900 border border-gray-200'
90
+ }`}
91
+ >
92
+ <div className="flex items-start space-x-2">
93
+ {message.type === 'assistant' && (
94
+ <Bot className="w-4 h-4 mt-1 flex-shrink-0" />
95
+ )}
96
+ <div className="flex-1">
97
+ <ReactMarkdown
98
+ remarkPlugins={[remarkGfm]}
99
+ className="prose prose-sm max-w-none leading-relaxed"
100
+ components={{
101
+ // Enhanced paragraph styling
102
+ p: ({ children }) => (
103
+ <p className="mb-3 text-gray-800 leading-6">{children}</p>
104
+ ),
105
+ // Enhanced heading styling
106
+ h1: ({ children }) => (
107
+ <h1 className="text-xl font-bold text-gray-900 mb-3 mt-4 border-b border-gray-200 pb-2">{children}</h1>
108
+ ),
109
+ h2: ({ children }) => (
110
+ <h2 className="text-lg font-semibold text-gray-900 mb-2 mt-4">{children}</h2>
111
+ ),
112
+ h3: ({ children }) => (
113
+ <h3 className="text-base font-medium text-gray-900 mb-2 mt-3">{children}</h3>
114
+ ),
115
+ // Enhanced list styling
116
+ ul: ({ children }) => (
117
+ <ul className="mb-3 ml-4 space-y-1">{children}</ul>
118
+ ),
119
+ ol: ({ children }) => (
120
+ <ol className="mb-3 ml-4 space-y-1 list-decimal">{children}</ol>
121
+ ),
122
+ li: ({ children }) => (
123
+ <li className="text-gray-800 leading-6">{children}</li>
124
+ ),
125
+ // Enhanced code styling
126
+ code: ({ node, inline, className, children, ...props }) => {
127
+ return (
128
+ <code
129
+ className={`${className} ${
130
+ inline
131
+ ? 'bg-gray-200 px-1.5 py-0.5 rounded text-sm font-mono text-gray-800'
132
+ : 'block bg-gray-100 p-3 rounded-md text-sm font-mono text-gray-800 border border-gray-200'
133
+ }`}
134
+ {...props}
135
+ >
136
+ {children}
137
+ </code>
138
+ )
139
+ },
140
+ // Enhanced blockquote styling
141
+ blockquote: ({ children }) => (
142
+ <blockquote className="border-l-4 border-blue-500 pl-4 py-2 my-3 bg-blue-50 rounded-r-md">
143
+ <div className="text-gray-700 italic">{children}</div>
144
+ </blockquote>
145
+ ),
146
+ // Enhanced table styling
147
+ table: ({ children }) => (
148
+ <div className="overflow-x-auto my-4">
149
+ <table className="min-w-full border border-gray-300 rounded-lg">
150
+ {children}
151
+ </table>
152
+ </div>
153
+ ),
154
+ th: ({ children }) => (
155
+ <th className="border border-gray-300 px-3 py-2 bg-gray-100 font-semibold text-gray-900 text-left">
156
+ {children}
157
+ </th>
158
+ ),
159
+ td: ({ children }) => (
160
+ <td className="border border-gray-300 px-3 py-2 text-gray-800">
161
+ {children}
162
+ </td>
163
+ ),
164
+ // Enhanced strong/bold styling
165
+ strong: ({ children }) => (
166
+ <strong className="font-semibold text-gray-900">{children}</strong>
167
+ ),
168
+ // Enhanced emphasis styling
169
+ em: ({ children }) => (
170
+ <em className="italic text-gray-700">{children}</em>
171
+ ),
172
+ }}
173
+ >
174
+ {message.content}
175
+ </ReactMarkdown>
176
+ {message.sources && message.sources.length > 0 && (
177
+ <div className="mt-3 pt-3 border-t border-gray-200">
178
+ <p className="text-xs font-medium text-gray-600 mb-2 flex items-center">
179
+ <span className="w-2 h-2 bg-blue-500 rounded-full mr-2"></span>
180
+ Sources:
181
+ </p>
182
+ <div className="space-y-1">
183
+ {message.sources.map((source, index) => (
184
+ <p key={index} className="text-xs text-gray-600 bg-gray-50 px-2 py-1 rounded">
185
+ {source}
186
+ </p>
187
+ ))}
188
+ </div>
189
+ </div>
190
+ )}
191
+ </div>
192
+ {message.type === 'user' && (
193
+ <User className="w-4 h-4 mt-1 flex-shrink-0" />
194
+ )}
195
+ </div>
196
+ <div className="text-xs opacity-70 mt-1">
197
+ {formatTimestamp(message.timestamp)}
198
+ </div>
199
+ </div>
200
+ </div>
201
+ ))
202
+ )}
203
+
204
+ {isLoading && (
205
+ <div className="flex justify-start">
206
+ <div className="bg-gray-100 rounded-lg px-4 py-2">
207
+ <div className="flex items-center space-x-2">
208
+ <Loader2 className="w-4 h-4 animate-spin" />
209
+ <span className="text-sm text-gray-600">Thinking...</span>
210
+ </div>
211
+ </div>
212
+ </div>
213
+ )}
214
+
215
+ <div ref={messagesEndRef} />
216
+ </div>
217
+
218
+ {/* Input */}
219
+ <div className="border-t border-gray-200 p-4">
220
+ <form onSubmit={handleSubmit} className="flex space-x-2">
221
+ <input
222
+ type="text"
223
+ value={input}
224
+ onChange={(e) => setInput(e.target.value)}
225
+ placeholder="Ask a question about your documents..."
226
+ className="flex-1 px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent"
227
+ disabled={isLoading}
228
+ />
229
+ <button
230
+ type="submit"
231
+ disabled={!input.trim() || isLoading}
232
+ className="px-4 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 disabled:opacity-50 disabled:cursor-not-allowed"
233
+ >
234
+ <Send className="w-4 h-4" />
235
+ </button>
236
+ </form>
237
+ </div>
238
+ </div>
239
+ )
240
+ }
frontend/components/DocumentList.tsx ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'use client'
2
+
3
+ import { useState, useEffect } from 'react'
4
+ import { FileText, Trash2, Calendar, HardDrive, Eye } from 'lucide-react'
5
+ import { apiService, Document } from '@/lib/api'
6
+
7
+ export default function DocumentList({ onDocumentChange }: { onDocumentChange?: () => void }) {
8
+ const [documents, setDocuments] = useState<Document[]>([])
9
+ const [loading, setLoading] = useState(true)
10
+ const [stats, setStats] = useState<any>(null)
11
+
12
+ useEffect(() => {
13
+ loadDocuments()
14
+ loadStats()
15
+ }, [])
16
+
17
+ const loadDocuments = async () => {
18
+ try {
19
+ setLoading(true)
20
+ const response = await apiService.getDocuments()
21
+ setDocuments(response.documents)
22
+ } catch (error) {
23
+ console.error('Error loading documents:', error)
24
+ } finally {
25
+ setLoading(false)
26
+ }
27
+ }
28
+
29
+ const loadStats = async () => {
30
+ try {
31
+ const response = await apiService.getDocumentStats()
32
+ setStats(response)
33
+ } catch (error) {
34
+ console.error('Error loading stats:', error)
35
+ }
36
+ }
37
+
38
+ const handleDelete = async (id: number) => {
39
+ try {
40
+ await apiService.deleteDocument(id)
41
+ setDocuments(prev => prev.filter(doc => doc.id !== id))
42
+ loadStats() // Refresh stats
43
+ loadDocuments() // Refresh document list after delete
44
+ if (onDocumentChange) onDocumentChange();
45
+ } catch (error) {
46
+ console.error('Error deleting document:', error)
47
+ alert('Failed to delete document')
48
+ }
49
+ }
50
+
51
+ const formatFileSize = (bytes: number) => {
52
+ if (bytes === 0) return '0 Bytes'
53
+ const k = 1024
54
+ const sizes = ['Bytes', 'KB', 'MB', 'GB']
55
+ const i = Math.floor(Math.log(bytes) / Math.log(k))
56
+ return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i]
57
+ }
58
+
59
+ const formatDate = (dateString: string) => {
60
+ return new Date(dateString).toLocaleDateString('en-US', {
61
+ year: 'numeric',
62
+ month: 'short',
63
+ day: 'numeric',
64
+ hour: '2-digit',
65
+ minute: '2-digit',
66
+ })
67
+ }
68
+
69
+ if (loading) {
70
+ return (
71
+ <div className="flex items-center justify-center py-8">
72
+ <div className="animate-spin rounded-full h-8 w-8 border-b-2 border-blue-600"></div>
73
+ </div>
74
+ )
75
+ }
76
+
77
+ return (
78
+ <div className="space-y-6">
79
+ {/* Stats */}
80
+ {stats && (
81
+ <div className="grid grid-cols-1 md:grid-cols-4 gap-4">
82
+ <div className="bg-white p-4 rounded-lg border border-gray-200">
83
+ <div className="flex items-center">
84
+ <FileText className="w-8 h-8 text-blue-600" />
85
+ <div className="ml-3">
86
+ <p className="text-sm font-medium text-gray-600">Total Documents</p>
87
+ <p className="text-2xl font-bold text-gray-900">{stats.total_documents}</p>
88
+ </div>
89
+ </div>
90
+ </div>
91
+
92
+ <div className="bg-white p-4 rounded-lg border border-gray-200">
93
+ <div className="flex items-center">
94
+ <CheckCircle className="w-8 h-8 text-green-600" />
95
+ <div className="ml-3">
96
+ <p className="text-sm font-medium text-gray-600">Processed</p>
97
+ <p className="text-2xl font-bold text-gray-900">{stats.processed_documents}</p>
98
+ </div>
99
+ </div>
100
+ </div>
101
+
102
+ <div className="bg-white p-4 rounded-lg border border-gray-200">
103
+ <div className="flex items-center">
104
+ <HardDrive className="w-8 h-8 text-purple-600" />
105
+ <div className="ml-3">
106
+ <p className="text-sm font-medium text-gray-600">Total Size</p>
107
+ <p className="text-2xl font-bold text-gray-900">{stats.total_size_mb} MB</p>
108
+ </div>
109
+ </div>
110
+ </div>
111
+
112
+ <div className="bg-white p-4 rounded-lg border border-gray-200">
113
+ <div className="flex items-center">
114
+ <Database className="w-8 h-8 text-orange-600" />
115
+ <div className="ml-3">
116
+ <p className="text-sm font-medium text-gray-600">Vector Chunks</p>
117
+ <p className="text-2xl font-bold text-gray-900">{stats.vector_store_chunks}</p>
118
+ </div>
119
+ </div>
120
+ </div>
121
+ </div>
122
+ )}
123
+
124
+ {/* Documents List */}
125
+ <div>
126
+ <h3 className="text-lg font-medium text-gray-900 mb-4">Uploaded Documents</h3>
127
+
128
+ {documents.length === 0 ? (
129
+ <div className="text-center py-8">
130
+ <FileText className="w-12 h-12 mx-auto text-gray-400 mb-4" />
131
+ <p className="text-gray-500">No documents uploaded yet</p>
132
+ <p className="text-sm text-gray-400">Upload your first PDF document to get started</p>
133
+ </div>
134
+ ) : (
135
+ <div className="bg-white border border-gray-200 rounded-lg overflow-hidden">
136
+ <div className="overflow-x-auto">
137
+ <table className="min-w-full divide-y divide-gray-200">
138
+ <thead className="bg-gray-50">
139
+ <tr>
140
+ <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
141
+ Document
142
+ </th>
143
+ <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
144
+ Size
145
+ </th>
146
+ <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
147
+ Status
148
+ </th>
149
+ <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
150
+ Uploaded
151
+ </th>
152
+ <th className="px-6 py-3 text-right text-xs font-medium text-gray-500 uppercase tracking-wider">
153
+ Actions
154
+ </th>
155
+ </tr>
156
+ </thead>
157
+ <tbody className="bg-white divide-y divide-gray-200">
158
+ {documents.map((document) => (
159
+ <tr key={document.id} className="hover:bg-gray-50">
160
+ <td className="px-6 py-4 whitespace-nowrap">
161
+ <div className="flex items-center">
162
+ <FileText className="w-5 h-5 text-gray-400 mr-3" />
163
+ <div>
164
+ <div className="text-sm font-medium text-gray-900">
165
+ {document.original_filename}
166
+ </div>
167
+ <div className="text-sm text-gray-500">
168
+ ID: {document.id}
169
+ </div>
170
+ </div>
171
+ </div>
172
+ </td>
173
+ <td className="px-6 py-4 whitespace-nowrap text-sm text-gray-900">
174
+ {formatFileSize(document.file_size)}
175
+ </td>
176
+ <td className="px-6 py-4 whitespace-nowrap">
177
+ <span className={`inline-flex px-2 py-1 text-xs font-semibold rounded-full ${
178
+ document.processed
179
+ ? 'bg-green-100 text-green-800'
180
+ : 'bg-yellow-100 text-yellow-800'
181
+ }`}>
182
+ {document.processed ? 'Processed' : 'Processing'}
183
+ </span>
184
+ </td>
185
+ <td className="px-6 py-4 whitespace-nowrap text-sm text-gray-500">
186
+ <div className="flex items-center">
187
+ <Calendar className="w-4 h-4 mr-1" />
188
+ {formatDate(document.created_at)}
189
+ </div>
190
+ </td>
191
+ <td className="px-6 py-4 whitespace-nowrap text-right text-sm font-medium">
192
+ <button
193
+ onClick={() => handleDelete(document.id)}
194
+ className="text-red-600 hover:text-red-900 p-1"
195
+ title="Delete document"
196
+ >
197
+ <Trash2 className="w-4 h-4" />
198
+ </button>
199
+ </td>
200
+ </tr>
201
+ ))}
202
+ </tbody>
203
+ </table>
204
+ </div>
205
+ </div>
206
+ )}
207
+ </div>
208
+ </div>
209
+ )
210
+ }
211
+
212
+ // Placeholder components for missing icons
213
+ const CheckCircle = ({ className }: { className?: string }) => (
214
+ <svg className={className} fill="none" stroke="currentColor" viewBox="0 0 24 24">
215
+ <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
216
+ </svg>
217
+ )
218
+
219
+ const Database = ({ className }: { className?: string }) => (
220
+ <svg className={className} fill="none" stroke="currentColor" viewBox="0 0 24 24">
221
+ <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 7v10c0 2.21 3.582 4 8 4s8-1.79 8-4V7M4 7c0 2.21 3.582 4 8 4s8-1.79 8-4M4 7c0-2.21 3.582-4 8-4s8 1.79 8 4" />
222
+ </svg>
223
+ )
frontend/components/DocumentUpload.tsx ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'use client'
2
+
3
+ import { useState, useCallback } from 'react'
4
+ import { useDropzone } from 'react-dropzone'
5
+ import { Upload, FileText, X, CheckCircle, AlertCircle } from 'lucide-react'
6
+ import { apiService } from '@/lib/api'
7
+
8
+ interface UploadStatus {
9
+ file: File
10
+ status: 'uploading' | 'success' | 'error'
11
+ message?: string
12
+ }
13
+
14
+ interface DocumentUploadProps {
15
+ disabled?: boolean
16
+ onDocumentChange?: () => void
17
+ }
18
+
19
+ export default function DocumentUpload({ disabled, onDocumentChange }: DocumentUploadProps) {
20
+ const [uploadStatuses, setUploadStatuses] = useState<UploadStatus[]>([])
21
+ const [infoMessage, setInfoMessage] = useState<string | null>(null)
22
+
23
+ const onDrop = useCallback(async (acceptedFiles: File[]) => {
24
+ if (disabled) return;
25
+ // Fetch current document count
26
+ let currentCount = 0;
27
+ try {
28
+ const response = await apiService.getDocuments();
29
+ currentCount = response.documents.length;
30
+ } catch (e) {
31
+ currentCount = 0;
32
+ }
33
+ // Only allow up to 3 documents total
34
+ const allowed = Math.max(0, 3 - currentCount);
35
+ const filesToUpload = acceptedFiles.slice(0, allowed);
36
+ const ignoredCount = acceptedFiles.length - filesToUpload.length;
37
+ if (ignoredCount > 0) {
38
+ setInfoMessage(`Only the first ${allowed} file(s) were uploaded. The rest were ignored to keep the maximum of 3 documents.`);
39
+ setTimeout(() => setInfoMessage(null), 4000);
40
+ }
41
+ if (filesToUpload.length === 0) return;
42
+ const newUploads = filesToUpload.map(file => ({
43
+ file,
44
+ status: 'uploading' as const,
45
+ }))
46
+ setUploadStatuses(prev => [...prev, ...newUploads])
47
+ for (const file of filesToUpload) {
48
+ try {
49
+ const response = await apiService.uploadDocument(file)
50
+ setUploadStatuses(prev =>
51
+ prev.map(upload =>
52
+ upload.file === file
53
+ ? { ...upload, status: 'success', message: response.message }
54
+ : upload
55
+ )
56
+ )
57
+ if (onDocumentChange) onDocumentChange();
58
+ } catch (error: any) {
59
+ setUploadStatuses(prev =>
60
+ prev.map(upload =>
61
+ upload.file === file
62
+ ? { ...upload, status: 'error', message: error.response?.data?.detail || 'Upload failed' }
63
+ : upload
64
+ )
65
+ )
66
+ }
67
+ }
68
+ }, [disabled, onDocumentChange])
69
+
70
+ const { getRootProps, getInputProps, isDragActive } = useDropzone({
71
+ onDrop,
72
+ accept: {
73
+ 'application/pdf': ['.pdf']
74
+ },
75
+ multiple: true,
76
+ maxSize: 10 * 1024 * 1024, // 10MB
77
+ disabled,
78
+ })
79
+
80
+ const removeUpload = async (file: File) => {
81
+ // If the upload was successful, delete the document from the backend as well
82
+ const upload = uploadStatuses.find(u => u.file === file)
83
+ if (upload && upload.status === 'success') {
84
+ try {
85
+ // Fetch all documents and find the one with matching original_filename
86
+ const response = await apiService.getDocuments()
87
+ const doc = response.documents.find((d: any) => d.original_filename === file.name)
88
+ if (doc) {
89
+ await apiService.deleteDocument(doc.id)
90
+ }
91
+ } catch (e) {
92
+ // Optionally show error, but always remove from UI
93
+ }
94
+ }
95
+ setUploadStatuses(prev => prev.filter(upload => upload.file !== file))
96
+ if (onDocumentChange) onDocumentChange();
97
+ }
98
+
99
+ const formatFileSize = (bytes: number) => {
100
+ if (bytes === 0) return '0 Bytes'
101
+ const k = 1024
102
+ const sizes = ['Bytes', 'KB', 'MB', 'GB']
103
+ const i = Math.floor(Math.log(bytes) / Math.log(k))
104
+ return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i]
105
+ }
106
+
107
+ return (
108
+ <div className="space-y-4">
109
+ <div
110
+ {...getRootProps()}
111
+ className={`border-2 border-dashed rounded-lg p-8 text-center transition-colors ${
112
+ disabled
113
+ ? 'border-gray-200 bg-gray-100 cursor-not-allowed opacity-60'
114
+ : isDragActive
115
+ ? 'border-blue-500 bg-blue-50 cursor-pointer'
116
+ : 'border-gray-300 hover:border-gray-400 cursor-pointer'
117
+ }`}
118
+ >
119
+ <input {...getInputProps()} disabled={disabled} />
120
+ <Upload className="w-12 h-12 mx-auto mb-4 text-gray-400" />
121
+ {infoMessage && (
122
+ <p className="text-sm text-yellow-600 mb-2">{infoMessage}</p>
123
+ )}
124
+ {disabled ? (
125
+ <p className="text-lg font-medium text-gray-400">Maximum 3 documents uploaded</p>
126
+ ) : isDragActive ? (
127
+ <p className="text-lg font-medium text-blue-600">Drop the PDF files here...</p>
128
+ ) : (
129
+ <div>
130
+ <p className="text-lg font-medium text-gray-900 mb-2">
131
+ Upload PDF Documents
132
+ </p>
133
+ <p className="text-sm text-gray-600 mb-4">
134
+ Drag and drop PDF files here, or click to select files
135
+ </p>
136
+ <p className="text-xs text-gray-500">
137
+ Maximum file size: 10MB β€’ Supported format: PDF
138
+ </p>
139
+ </div>
140
+ )}
141
+ </div>
142
+
143
+ {/* Upload Status */}
144
+ {uploadStatuses.length > 0 && (
145
+ <div className="space-y-2">
146
+ <h3 className="text-sm font-medium text-gray-900">Upload Status</h3>
147
+ {uploadStatuses.map((upload, index) => (
148
+ <div
149
+ key={index}
150
+ className="flex items-center justify-between p-3 bg-gray-50 rounded-lg"
151
+ >
152
+ <div className="flex items-center space-x-3">
153
+ <FileText className="w-5 h-5 text-gray-400" />
154
+ <div>
155
+ <p className="text-sm font-medium text-gray-900">
156
+ {upload.file.name}
157
+ </p>
158
+ <p className="text-xs text-gray-500">
159
+ {formatFileSize(upload.file.size)}
160
+ </p>
161
+ </div>
162
+ </div>
163
+
164
+ <div className="flex items-center space-x-2">
165
+ {upload.status === 'uploading' && (
166
+ <div className="flex items-center space-x-2">
167
+ <div className="w-4 h-4 border-2 border-blue-500 border-t-transparent rounded-full animate-spin" />
168
+ <span className="text-sm text-blue-600">Uploading...</span>
169
+ </div>
170
+ )}
171
+
172
+ {upload.status === 'success' && (
173
+ <div className="flex items-center space-x-2">
174
+ <CheckCircle className="w-4 h-4 text-green-500" />
175
+ <span className="text-sm text-green-600">Success</span>
176
+ </div>
177
+ )}
178
+
179
+ {upload.status === 'error' && (
180
+ <div className="flex items-center space-x-2">
181
+ <AlertCircle className="w-4 h-4 text-red-500" />
182
+ <span className="text-sm text-red-600">Error</span>
183
+ </div>
184
+ )}
185
+
186
+ <button
187
+ onClick={() => removeUpload(upload.file)}
188
+ className="p-1 hover:bg-gray-200 rounded"
189
+ >
190
+ <X className="w-4 h-4 text-gray-400" />
191
+ </button>
192
+ </div>
193
+ </div>
194
+ ))}
195
+ </div>
196
+ )}
197
+ </div>
198
+ )
199
+ }
frontend/lib/api.ts ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import axios from 'axios'
2
+
3
+ const API_BASE_URL = process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000'
4
+
5
+ const api = axios.create({
6
+ baseURL: API_BASE_URL,
7
+ headers: {
8
+ 'Content-Type': 'application/json',
9
+ },
10
+ })
11
+
12
+ export interface Document {
13
+ id: number
14
+ filename: string
15
+ original_filename: string
16
+ file_size: number
17
+ content?: string
18
+ processed: boolean
19
+ created_at: string
20
+ updated_at?: string
21
+ }
22
+
23
+ export interface ChatRequest {
24
+ question: string
25
+ session_id: string
26
+ model?: string
27
+ }
28
+
29
+ export interface ChatResponse {
30
+ success: boolean
31
+ answer: string
32
+ model?: string
33
+ sources: string[]
34
+ session_id: string
35
+ message_id?: number
36
+ }
37
+
38
+ export interface UploadResponse {
39
+ success: boolean
40
+ document?: Document
41
+ message: string
42
+ }
43
+
44
+ export const apiService = {
45
+ // Document endpoints
46
+ uploadDocument: async (file: File): Promise<UploadResponse> => {
47
+ const formData = new FormData()
48
+ formData.append('file', file)
49
+
50
+ const response = await api.post('/api/v1/documents/upload', formData, {
51
+ headers: {
52
+ 'Content-Type': 'multipart/form-data',
53
+ },
54
+ })
55
+ return response.data
56
+ },
57
+
58
+ getDocuments: async (): Promise<{ documents: Document[]; total: number }> => {
59
+ const response = await api.get('/api/v1/documents/')
60
+ return response.data
61
+ },
62
+
63
+ deleteDocument: async (id: number): Promise<{ success: boolean; message: string }> => {
64
+ const response = await api.delete(`/api/v1/documents/${id}`)
65
+ return response.data
66
+ },
67
+
68
+ getDocumentStats: async (): Promise<any> => {
69
+ const response = await api.get('/api/v1/documents/stats/summary')
70
+ return response.data
71
+ },
72
+
73
+ // Chat endpoints
74
+ sendMessage: async (request: ChatRequest): Promise<ChatResponse> => {
75
+ const response = await api.post('/api/v1/chat/', request)
76
+ return response.data
77
+ },
78
+
79
+ getChatHistory: async (sessionId: string): Promise<any> => {
80
+ const response = await api.get(`/api/v1/chat/history/${sessionId}`)
81
+ return response.data
82
+ },
83
+
84
+ createSession: async (): Promise<{ session_id: string }> => {
85
+ const response = await api.post('/api/v1/chat/session/new')
86
+ return response.data
87
+ },
88
+
89
+ getSessions: async (): Promise<any[]> => {
90
+ const response = await api.get('/api/v1/chat/sessions')
91
+ return response.data
92
+ },
93
+
94
+ deleteSession: async (sessionId: string): Promise<{ success: boolean; message: string }> => {
95
+ const response = await api.delete(`/api/v1/chat/session/${sessionId}`)
96
+ return response.data
97
+ },
98
+
99
+ getAvailableModels: async (): Promise<{ available_models: string[]; is_configured: boolean }> => {
100
+ const response = await api.get('/api/v1/chat/models/available')
101
+ return response.data
102
+ },
103
+
104
+ // Health check
105
+ healthCheck: async (): Promise<any> => {
106
+ const response = await api.get('/health')
107
+ return response.data
108
+ },
109
+ }
frontend/lib/store.ts ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { create } from 'zustand'
2
+ import { v4 as uuidv4 } from 'uuid'
3
+
4
+ export interface ChatMessage {
5
+ id: string
6
+ content: string
7
+ type: 'user' | 'assistant'
8
+ timestamp: Date
9
+ sources?: string[]
10
+ }
11
+
12
+ interface ChatStore {
13
+ sessionId: string | null
14
+ messages: ChatMessage[]
15
+ isLoading: boolean
16
+ createNewSession: () => void
17
+ addMessage: (message: Omit<ChatMessage, 'id' | 'timestamp'>) => void
18
+ setLoading: (loading: boolean) => void
19
+ clearMessages: () => void
20
+ }
21
+
22
+ export const useChatStore = create<ChatStore>((set, get) => ({
23
+ sessionId: null,
24
+ messages: [],
25
+ isLoading: false,
26
+
27
+ createNewSession: () => {
28
+ const sessionId = uuidv4()
29
+ set({ sessionId, messages: [] })
30
+ },
31
+
32
+ addMessage: (message) => {
33
+ const newMessage: ChatMessage = {
34
+ ...message,
35
+ id: uuidv4(),
36
+ timestamp: new Date(),
37
+ }
38
+ set((state) => ({
39
+ messages: [...state.messages, newMessage]
40
+ }))
41
+ },
42
+
43
+ setLoading: (loading) => {
44
+ set({ isLoading: loading })
45
+ },
46
+
47
+ clearMessages: () => {
48
+ set({ messages: [] })
49
+ },
50
+ }))
frontend/next-env.d.ts ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ /// <reference types="next" />
2
+ /// <reference types="next/image-types/global" />
3
+
4
+ // NOTE: This file should not be edited
5
+ // see https://nextjs.org/docs/basic-features/typescript for more information.
frontend/next.config.js ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /** @type {import('next').NextConfig} */
2
+ const nextConfig = {
3
+ experimental: {
4
+ appDir: true,
5
+ },
6
+ images: {
7
+ domains: ['localhost'],
8
+ },
9
+ async rewrites() {
10
+ return [
11
+ {
12
+ source: '/api/:path*',
13
+ destination: 'http://localhost:8000/api/:path*',
14
+ },
15
+ ];
16
+ },
17
+ };
18
+
19
+ module.exports = nextConfig;
frontend/package-lock.json ADDED
The diff for this file is too large to render. See raw diff
 
frontend/package.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "pdf-chatbot-frontend",
3
+ "version": "0.1.0",
4
+ "private": true,
5
+ "scripts": {
6
+ "dev": "next dev",
7
+ "build": "next build",
8
+ "start": "next start",
9
+ "lint": "next lint"
10
+ },
11
+ "dependencies": {
12
+ "next": "14.0.4",
13
+ "react": "^18",
14
+ "react-dom": "^18",
15
+ "@types/node": "^20",
16
+ "@types/react": "^18",
17
+ "@types/react-dom": "^18",
18
+ "typescript": "^5",
19
+ "tailwindcss": "^3.3.0",
20
+ "autoprefixer": "^10.0.1",
21
+ "postcss": "^8",
22
+ "lucide-react": "^0.294.0",
23
+ "class-variance-authority": "^0.7.0",
24
+ "clsx": "^2.0.0",
25
+ "tailwind-merge": "^2.0.0",
26
+ "zustand": "^4.4.7",
27
+ "react-hook-form": "^7.48.2",
28
+ "@hookform/resolvers": "^3.3.2",
29
+ "zod": "^3.22.4",
30
+ "axios": "^1.6.2",
31
+ "react-dropzone": "^14.2.3",
32
+ "react-markdown": "^9.0.1",
33
+ "remark-gfm": "^4.0.0",
34
+ "react-syntax-highlighter": "^15.5.0",
35
+ "@types/react-syntax-highlighter": "^15.5.11",
36
+ "uuid": "^9.0.1",
37
+ "@types/uuid": "^9.0.7",
38
+ "tailwindcss-animate": "^1.0.7"
39
+ },
40
+ "devDependencies": {
41
+ "eslint": "^8",
42
+ "eslint-config-next": "14.0.4"
43
+ }
44
+ }
frontend/postcss.config.js ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ module.exports = {
2
+ plugins: {
3
+ tailwindcss: {},
4
+ autoprefixer: {},
5
+ },
6
+ }
frontend/tailwind.config.js ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /** @type {import('tailwindcss').Config} */
2
+ module.exports = {
3
+ darkMode: ["class"],
4
+ content: [
5
+ './pages/**/*.{ts,tsx}',
6
+ './components/**/*.{ts,tsx}',
7
+ './app/**/*.{ts,tsx}',
8
+ './src/**/*.{ts,tsx}',
9
+ ],
10
+ theme: {
11
+ container: {
12
+ center: true,
13
+ padding: "2rem",
14
+ screens: {
15
+ "2xl": "1400px",
16
+ },
17
+ },
18
+ extend: {
19
+ colors: {
20
+ border: "hsl(var(--border))",
21
+ input: "hsl(var(--input))",
22
+ ring: "hsl(var(--ring))",
23
+ background: "hsl(var(--background))",
24
+ foreground: "hsl(var(--foreground))",
25
+ primary: {
26
+ DEFAULT: "hsl(var(--primary))",
27
+ foreground: "hsl(var(--primary-foreground))",
28
+ },
29
+ secondary: {
30
+ DEFAULT: "hsl(var(--secondary))",
31
+ foreground: "hsl(var(--secondary-foreground))",
32
+ },
33
+ destructive: {
34
+ DEFAULT: "hsl(var(--destructive))",
35
+ foreground: "hsl(var(--destructive-foreground))",
36
+ },
37
+ muted: {
38
+ DEFAULT: "hsl(var(--muted))",
39
+ foreground: "hsl(var(--muted-foreground))",
40
+ },
41
+ accent: {
42
+ DEFAULT: "hsl(var(--accent))",
43
+ foreground: "hsl(var(--accent-foreground))",
44
+ },
45
+ popover: {
46
+ DEFAULT: "hsl(var(--popover))",
47
+ foreground: "hsl(var(--popover-foreground))",
48
+ },
49
+ card: {
50
+ DEFAULT: "hsl(var(--card))",
51
+ foreground: "hsl(var(--card-foreground))",
52
+ },
53
+ },
54
+ borderRadius: {
55
+ lg: "var(--radius)",
56
+ md: "calc(var(--radius) - 2px)",
57
+ sm: "calc(var(--radius) - 4px)",
58
+ },
59
+ keyframes: {
60
+ "accordion-down": {
61
+ from: { height: 0 },
62
+ to: { height: "var(--radix-accordion-content-height)" },
63
+ },
64
+ "accordion-up": {
65
+ from: { height: "var(--radix-accordion-content-height)" },
66
+ to: { height: 0 },
67
+ },
68
+ },
69
+ animation: {
70
+ "accordion-down": "accordion-down 0.2s ease-out",
71
+ "accordion-up": "accordion-up 0.2s ease-out",
72
+ },
73
+ },
74
+ },
75
+ plugins: [require("tailwindcss-animate")],
76
+ }
frontend/tsconfig.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "compilerOptions": {
3
+ "target": "es5",
4
+ "lib": ["dom", "dom.iterable", "es6"],
5
+ "allowJs": true,
6
+ "skipLibCheck": true,
7
+ "strict": true,
8
+ "noEmit": true,
9
+ "esModuleInterop": true,
10
+ "module": "esnext",
11
+ "moduleResolution": "bundler",
12
+ "resolveJsonModule": true,
13
+ "isolatedModules": true,
14
+ "jsx": "preserve",
15
+ "incremental": true,
16
+ "plugins": [
17
+ {
18
+ "name": "next"
19
+ }
20
+ ],
21
+ "baseUrl": ".",
22
+ "paths": {
23
+ "@/*": ["./*"]
24
+ }
25
+ },
26
+ "include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
27
+ "exclude": ["node_modules"]
28
+ }
setup.ps1 ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell setup script for PDF Q&A Chatbot System
2
+
3
+ Write-Host "πŸš€ Setting up PDF Q&A Chatbot System..." -ForegroundColor Green
4
+
5
+ # Check if Python is installed
6
+ try {
7
+ $pythonVersion = python --version 2>&1
8
+ Write-Host "βœ… Python found: $pythonVersion" -ForegroundColor Green
9
+ } catch {
10
+ Write-Host "❌ Python is required but not installed. Please install Python 3.8+ and try again." -ForegroundColor Red
11
+ exit 1
12
+ }
13
+
14
+ # Check if Node.js is installed
15
+ try {
16
+ $nodeVersion = node --version 2>&1
17
+ Write-Host "βœ… Node.js found: $nodeVersion" -ForegroundColor Green
18
+ } catch {
19
+ Write-Host "❌ Node.js is required but not installed. Please install Node.js 18+ and try again." -ForegroundColor Red
20
+ exit 1
21
+ }
22
+
23
+ # Check if npm is installed
24
+ try {
25
+ $npmVersion = npm --version 2>&1
26
+ Write-Host "βœ… npm found: $npmVersion" -ForegroundColor Green
27
+ } catch {
28
+ Write-Host "❌ npm is required but not installed. Please install npm and try again." -ForegroundColor Red
29
+ exit 1
30
+ }
31
+
32
+ Write-Host "βœ… Prerequisites check passed" -ForegroundColor Green
33
+
34
+ # Backend setup
35
+ Write-Host "πŸ“¦ Setting up backend..." -ForegroundColor Yellow
36
+ Set-Location backend
37
+
38
+ # Create virtual environment
39
+ Write-Host "Creating Python virtual environment..." -ForegroundColor Yellow
40
+ python -m venv venv
41
+
42
+ # Activate virtual environment
43
+ Write-Host "Activating virtual environment..." -ForegroundColor Yellow
44
+ .\venv\Scripts\Activate.ps1
45
+
46
+ # Install dependencies
47
+ Write-Host "Installing Python dependencies..." -ForegroundColor Yellow
48
+ pip install -r requirements.txt
49
+
50
+ # Create .env file if it doesn't exist
51
+ if (-not (Test-Path .env)) {
52
+ Write-Host "Creating .env file..." -ForegroundColor Yellow
53
+ Copy-Item .env.example .env
54
+ Write-Host "⚠️ Please edit backend/.env and add your API keys (OpenAI or Anthropic)" -ForegroundColor Yellow
55
+ }
56
+
57
+ Set-Location ..
58
+
59
+ # Frontend setup
60
+ Write-Host "πŸ“¦ Setting up frontend..." -ForegroundColor Yellow
61
+ Set-Location frontend
62
+
63
+ # Install dependencies
64
+ Write-Host "Installing Node.js dependencies..." -ForegroundColor Yellow
65
+ npm install
66
+
67
+ # Create .env file if it doesn't exist
68
+ if (-not (Test-Path .env)) {
69
+ Write-Host "Creating .env file..." -ForegroundColor Yellow
70
+ Copy-Item .env.example .env
71
+ }
72
+
73
+ Set-Location ..
74
+
75
+ Write-Host ""
76
+ Write-Host "πŸŽ‰ Setup completed successfully!" -ForegroundColor Green
77
+ Write-Host ""
78
+ Write-Host "πŸ“‹ Next steps:" -ForegroundColor Cyan
79
+ Write-Host "1. Edit backend/.env and add your API keys:" -ForegroundColor White
80
+ Write-Host " - OPENAI_API_KEY or ANTHROPIC_API_KEY" -ForegroundColor White
81
+ Write-Host ""
82
+ Write-Host "2. Start the backend server:" -ForegroundColor White
83
+ Write-Host " cd backend" -ForegroundColor White
84
+ Write-Host " .\venv\Scripts\Activate.ps1" -ForegroundColor White
85
+ Write-Host " uvicorn main:app --reload" -ForegroundColor White
86
+ Write-Host ""
87
+ Write-Host "3. Start the frontend server (in a new terminal):" -ForegroundColor White
88
+ Write-Host " cd frontend" -ForegroundColor White
89
+ Write-Host " npm run dev" -ForegroundColor White
90
+ Write-Host ""
91
+ Write-Host "4. Open your browser and go to: http://localhost:3000" -ForegroundColor White
92
+ Write-Host ""
93
+ Write-Host "🐳 Alternatively, you can use Docker:" -ForegroundColor White
94
+ Write-Host " docker-compose up --build" -ForegroundColor White
95
+ Write-Host ""
96
+ Write-Host "πŸ“š For more information, see the README.md file" -ForegroundColor White
setup.sh ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ echo "πŸš€ Setting up PDF Q&A Chatbot System..."
4
+
5
+ # Check if Python is installed
6
+ if ! command -v python3 &> /dev/null; then
7
+ echo "❌ Python 3 is required but not installed. Please install Python 3.8+ and try again."
8
+ exit 1
9
+ fi
10
+
11
+ # Check if Node.js is installed
12
+ if ! command -v node &> /dev/null; then
13
+ echo "❌ Node.js is required but not installed. Please install Node.js 18+ and try again."
14
+ exit 1
15
+ fi
16
+
17
+ # Check if npm is installed
18
+ if ! command -v npm &> /dev/null; then
19
+ echo "❌ npm is required but not installed. Please install npm and try again."
20
+ exit 1
21
+ fi
22
+
23
+ echo "βœ… Prerequisites check passed"
24
+
25
+ # Backend setup
26
+ echo "πŸ“¦ Setting up backend..."
27
+ cd backend
28
+
29
+ # Create virtual environment
30
+ echo "Creating Python virtual environment..."
31
+ python3 -m venv venv
32
+
33
+ # Activate virtual environment
34
+ if [[ "$OSTYPE" == "msys" || "$OSTYPE" == "win32" ]]; then
35
+ source venv/Scripts/activate
36
+ else
37
+ source venv/bin/activate
38
+ fi
39
+
40
+ # Install dependencies
41
+ echo "Installing Python dependencies..."
42
+ pip install -r requirements.txt
43
+
44
+ # Create .env file if it doesn't exist
45
+ if [ ! -f .env ]; then
46
+ echo "Creating .env file..."
47
+ cp .env.example .env
48
+ echo "⚠️ Please edit backend/.env and add your API keys (OpenAI or Anthropic)"
49
+ fi
50
+
51
+ cd ..
52
+
53
+ # Frontend setup
54
+ echo "πŸ“¦ Setting up frontend..."
55
+ cd frontend
56
+
57
+ # Install dependencies
58
+ echo "Installing Node.js dependencies..."
59
+ npm install
60
+
61
+ # Create .env file if it doesn't exist
62
+ if [ ! -f .env ]; then
63
+ echo "Creating .env file..."
64
+ cp .env.example .env
65
+ fi
66
+
67
+ cd ..
68
+
69
+ echo ""
70
+ echo "πŸŽ‰ Setup completed successfully!"
71
+ echo ""
72
+ echo "πŸ“‹ Next steps:"
73
+ echo "1. Edit backend/.env and add your API keys:"
74
+ echo " - OPENAI_API_KEY or ANTHROPIC_API_KEY"
75
+ echo ""
76
+ echo "2. Start the backend server:"
77
+ echo " cd backend"
78
+ echo " source venv/bin/activate # On Windows: venv\\Scripts\\activate"
79
+ echo " uvicorn main:app --reload"
80
+ echo ""
81
+ echo "3. Start the frontend server (in a new terminal):"
82
+ echo " cd frontend"
83
+ echo " npm run dev"
84
+ echo ""
85
+ echo "4. Open your browser and go to: http://localhost:3000"
86
+ echo ""
87
+ echo "🐳 Alternatively, you can use Docker:"
88
+ echo " docker-compose up --build"
89
+ echo ""
90
+ echo "πŸ“š For more information, see the README.md file"