MedChatBot / README.md
tmt3103's picture
first commit
dc235a0
metadata
title: MedChatBot
emoji: πŸ’Š
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

MedChatBot

A medical chatbot application that uses RAG (Retrieval-Augmented Generation) architecture to answer medical questions based on medical literature. The system combines Google Gemini 2.5 Pro as the language model with Pinecone vector database for efficient document retrieval.

Technology Stack

  • Backend: Flask
  • Language Model: Google Gemini 2.5 Pro
  • Vector Database: Pinecone
  • Embeddings: HuggingFace sentence-transformers (all-MiniLM-L6-v2)
  • Document Processing: LangChain, PyPDF
  • Frontend: HTML/CSS/JavaScript

Installation & Setup

Step 1: Clone the Repository

git clone https://github.com/TMTien31/MedChatBot.git
cd MedChatBot

Step 2: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate

# On macOS/Linux:
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Get API Keys

Google Gemini API Key:

  1. Go to Google AI Studio
  2. Create a new API key
  3. Copy the generated key

Pinecone API Key:

  1. Sign up at Pinecone
  2. Go to your dashboard
  3. Copy your API key from the "API Keys" section

Step 5: Create Environment File

Create a .env file in the project root directory:

# Create .env file
touch .env  # On macOS/Linux
# or create manually on Windows

Add your API keys to the .env file:

PINECONE_API_KEY=your_pinecone_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here

Step 6: Prepare Medical Documents

  • Place your PDF medical documents in the Data/ folder
  • The project includes "Gale Encyclopedia of Medicine Vol. 1 (A-B).pdf" by default
  • You can add more medical PDFs to expand the knowledge base

Step 7: Create Vector Index (Run Once)

Important: This step only needs to be run once initially, or whenever you add new documents to the Data/ folder.

python store_index.py

This script will:

  • Read all PDF files from the Data/ directory
  • Split text into 500-character chunks with 20-character overlap
  • Generate embeddings using sentence-transformers
  • Create and populate a Pinecone index named "medchatbot"

Note: This process may take several minutes depending on the size of your documents.

Running the Application

Start the Flask Server

python app.py

Access the Application

  1. Open your web browser
  2. Navigate to: http://0.0.0.0:8080 or http://localhost:8080
  3. You should see the medical chatbot interface