Spaces:

vesakkivignesh
/

medchat

Running

File size: 3,122 Bytes

---
title: Medical Chatbot
emoji: 🏥
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - medical
  - chatbot
  - rag
  - gemini
  - streamlit
---

# Medical Chatbot 🏥

An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.

## Features

- 🤖 Powered by Gemini 1.5 Flash for natural language understanding
- 📊 Uses Sentence Transformers for semantic search
- 🔍 Retrieves relevant medical information from vector database
- 📚 Provides citations with source attribution
- 🎯 Confidence scoring for each response
- 🌐 Beautiful Streamlit interface
- ⚠️ Important disclaimers for medical advice

## Prerequisites

1. Python 3.8 or higher
2. Pinecone account (https://www.pinecone.io/)
3. Google AI Studio API key (https://makersuite.google.com/app/apikey)
4. Hugging Face account (optional, for accessing datasets)

## Installation

**For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)**

1. Clone or download this repository

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the root directory:
```env
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here
```

4. Set up the database:
```bash
python setup_database.py
```

This will download medical data from Hugging Face and upload it to Pinecone.

## Usage

Run the Streamlit application:
```bash
streamlit run app.py
```

Open your browser to the URL shown (typically http://localhost:8501)

**Quick Start Guide:** [QUICK_START.md](QUICK_START.md)

## How It Works

1. **Data Loading**: Medical questions and answers are loaded from Hugging Face datasets
2. **Embedding**: Texts are converted to embeddings using Sentence Transformers
3. **Vector Storage**: Embeddings are stored in Pinecone for fast similarity search
4. **Query Processing**: User queries are embedded and searched against the database
5. **Response Generation**: Gemini 1.5 Flash generates responses based on retrieved context
6. **Citation**: Sources are tracked and displayed with confidence scores

## Important Disclaimers

- ⚠️ **This is not medical advice**
- ⚠️ **Not a substitute for professional healthcare**
- ⚠️ **Always consult healthcare professionals for medical decisions**
- ⚠️ **Confidence scores indicate data quality, not medical accuracy**

## Configuration

Edit `config.py` to customize:
- Embedding model
- Number of retrieved documents (TOP_K)
- Similarity threshold
- Dataset selection

## Troubleshooting

### "API Key not found"
- Ensure your `.env` file exists and contains valid API keys

### "Index not found"
- Run `python setup_database.py` to create the Pinecone index

### "No results found"
- The similarity threshold might be too high
- Adjust `SIMILARITY_THRESHOLD` in `config.py`

## License

This project is for educational purposes only. Medical information should be verified with healthcare professionals.