Spaces:

Prog-amin
/

pdf-qa-chatbot

Runtime error

App Files Files Community

pdf-qa-chatbot / README.md

Amin23

Initial commit

c573367 26 days ago

preview code

raw

history blame contribute delete

7.73 kB

	---
	title: PDF QA Chatbot
	emoji: "📄🤖"
	colorFrom: "blue"
	colorTo: "purple"
	sdk: docker
	app_file: app.py
	pinned: true
	---


	# PDF-Based Q&A Chatbot System

	A comprehensive end-to-end PDF-based Q&A chatbot system that processes uploaded PDF documents and enables users to retrieve accurate, context-aware answers via natural language queries.

	## Features

	- PDF Processing: Extract text and metadata from uploaded PDF documents
	- Vector Storage: Store document embeddings in ChromaDB for efficient retrieval
	- AI-Powered Q&A: Use OpenAI/Claude for intelligent question answering
	- Modern UI: Clean, responsive interface built with Next.js and Tailwind CSS
	- Real-time Chat: Interactive chat interface with conversation history
	- File Management: Upload, view, and manage multiple PDF documents
	- Context Awareness: Maintain conversation context and document references

	## Tech Stack

	### Backend
	- FastAPI: High-performance web framework
	- PyPDF2: PDF text extraction
	- ChromaDB: Vector database for embeddings
	- OpenAI/Claude: AI language models for Q&A
	- SQLAlchemy: Database ORM
	- Pydantic: Data validation

	### Frontend
	- Next.js 14: React framework with App Router
	- TypeScript: Type-safe development
	- Tailwind CSS: Utility-first styling
	- Shadcn/ui: Modern UI components
	- React Hook Form: Form handling
	- Zustand: State management

	## Project Structure

	```
	ChatbotCursor/
	├── backend/
	│ ├── app/
	│ │ ├── api/
	│ │ ├── core/
	│ │ ├── models/
	│ │ ├── services/
	│ │ └── utils/
	│ ├── requirements.txt
	│ └── main.py
	├── frontend/
	│ ├── app/
	│ ├── components/
	│ ├── lib/
	│ └── package.json
	├── docker-compose.yml
	└── README.md
	```

	## Quick Start

	### Option 1: Automated Setup (Recommended)

	For Linux/macOS:
	```bash
	chmod +x setup.sh
	./setup.sh
	```

	For Windows:
	```powershell
	.\setup.ps1
	```

	### Option 2: Manual Setup

	1. Clone and Setup
	```bash
	cd ChatbotCursor
	```

	2. Backend Setup
	```bash
	cd backend
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate
	pip install -r requirements.txt
	cp .env.example .env
	# Edit .env and add your API keys
	```

	3. Frontend Setup
	```bash
	cd frontend
	npm install
	cp .env.example .env
	```

	4. Environment Variables
	- Edit `backend/.env` and add your API keys:
	- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`
	- The frontend `.env` should work with defaults

	5. Run the Application
	```bash
	# Backend (Terminal 1)
	cd backend
	source venv/bin/activate # On Windows: venv\Scripts\activate
	uvicorn main:app --reload

	# Frontend (Terminal 2)
	cd frontend
	npm run dev
	```

	### Option 3: Docker Setup

	```bash
	# Build and run with Docker Compose
	docker-compose up --build

	# Or run services individually
	docker-compose up backend
	docker-compose up frontend
	```

	6. Access the Application
	- Frontend: http://localhost:3000
	- Backend API: http://localhost:8000
	- API Documentation: http://localhost:8000/docs

	## Usage

	### Getting Started

	1. Upload Documents
	- Navigate to the "Documents" tab
	- Drag and drop PDF files or click to select
	- Wait for processing (text extraction and vector embedding)
	- View upload status and document statistics

	2. Start Chatting
	- Switch to the "Chat" tab
	- Ask questions about your uploaded documents
	- Get AI-powered answers with source references
	- View conversation history

	3. Document Management
	- View all uploaded documents with metadata
	- Delete documents when no longer needed
	- Monitor processing status and file sizes

	### Features

	- Smart Document Processing: Automatic text extraction and chunking
	- Vector Search: Semantic similarity search for relevant content
	- AI-Powered Q&A: Context-aware answers using OpenAI or Claude
	- Source Citations: See which documents and sections were referenced
	- Conversation History: Persistent chat sessions
	- File Management: Upload, view, and delete documents
	- Real-time Processing: Live status updates during uploads

	### Supported File Types

	- PDF Documents: All standard PDF files
	- Maximum Size: 10MB per file
	- Processing: Automatic text extraction and metadata parsing

	## API Endpoints

	### Document Management
	- `POST /api/v1/documents/upload`: Upload PDF documents
	- `GET /api/v1/documents/`: List all documents
	- `GET /api/v1/documents/{id}`: Get specific document
	- `DELETE /api/v1/documents/{id}`: Delete a document
	- `GET /api/v1/documents/stats/summary`: Get document statistics

	### Chat & Q&A
	- `POST /api/v1/chat/`: Send questions and get answers
	- `GET /api/v1/chat/history/{session_id}`: Get chat history
	- `POST /api/v1/chat/session/new`: Create new chat session
	- `GET /api/v1/chat/sessions`: List all sessions
	- `DELETE /api/v1/chat/session/{session_id}`: Delete session
	- `GET /api/v1/chat/models/available`: Get available AI models

	### System
	- `GET /health`: Health check
	- `GET /docs`: Interactive API documentation (Swagger UI)
	- `GET /redoc`: Alternative API documentation

	## Configuration

	### Environment Variables

	Backend (.env):
	```env
	# Required: Set at least one AI provider
	OPENAI_API_KEY=your-openai-api-key
	ANTHROPIC_API_KEY=your-anthropic-api-key

	# Optional: Customize settings
	DATABASE_URL=sqlite:///./pdf_chatbot.db
	CHROMA_PERSIST_DIRECTORY=./chroma_db
	UPLOAD_DIR=./uploads
	MAX_FILE_SIZE=10485760
	```

	Frontend (.env):
	```env
	NEXT_PUBLIC_API_URL=http://localhost:8000
	```

	### AI Provider Setup

	1. OpenAI: Get API key from [OpenAI Platform](https://platform.openai.com/)
	2. Anthropic: Get API key from [Anthropic Console](https://console.anthropic.com/)

	## Development

	### Backend Development
	```bash
	cd backend
	source venv/bin/activate
	uvicorn main:app --reload --port 8000
	```

	### Frontend Development
	```bash
	cd frontend
	npm run dev
	```

	### Testing
	```bash
	# Backend tests
	cd backend
	pytest

	# Frontend tests
	cd frontend
	npm test
	```

	## Troubleshooting

	### Common Issues

	1. API Key Not Configured
	- Ensure you've added your API key to `backend/.env`
	- Restart the backend server after changing environment variables

	2. Upload Fails
	- Check file size (max 10MB)
	- Ensure file is a valid PDF
	- Check backend logs for detailed error messages

	3. Chat Not Working
	- Verify AI service is configured and working
	- Check if documents are properly processed
	- Review browser console for frontend errors

	4. Docker Issues
	- Ensure Docker and Docker Compose are installed
	- Check if ports 3000 and 8000 are available
	- Use `docker-compose logs` to view service logs

	## Contributing

	1. Fork the repository
	2. Create a feature branch (`git checkout -b feature/amazing-feature`)
	3. Commit your changes (`git commit -m 'Add amazing feature'`)
	4. Push to the branch (`git push origin feature/amazing-feature`)
	5. Open a Pull Request

	## License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	## Acknowledgments

	- Built with [FastAPI](https://fastapi.tiangolo.com/) and [Next.js](https://nextjs.org/)
	- Vector storage powered by [ChromaDB](https://www.trychroma.com/)
	- AI capabilities provided by [OpenAI](https://openai.com/) and [Anthropic](https://www.anthropic.com/)
	- UI components from [Tailwind CSS](https://tailwindcss.com/) and [Lucide React](https://lucide.dev/)

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference