File size: 3,122 Bytes
8a79799
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a5c991
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: Medical Chatbot
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - medical
  - chatbot
  - rag
  - gemini
  - streamlit
---

# Medical Chatbot πŸ₯

An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.

## Features

- πŸ€– Powered by Gemini 1.5 Flash for natural language understanding
- πŸ“Š Uses Sentence Transformers for semantic search
- πŸ” Retrieves relevant medical information from vector database
- πŸ“š Provides citations with source attribution
- 🎯 Confidence scoring for each response
- 🌐 Beautiful Streamlit interface
- ⚠️ Important disclaimers for medical advice

## Prerequisites

1. Python 3.8 or higher
2. Pinecone account (https://www.pinecone.io/)
3. Google AI Studio API key (https://makersuite.google.com/app/apikey)
4. Hugging Face account (optional, for accessing datasets)

## Installation

**For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)**

1. Clone or download this repository

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the root directory:
```env
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here
```

4. Set up the database:
```bash
python setup_database.py
```

This will download medical data from Hugging Face and upload it to Pinecone.

## Usage

Run the Streamlit application:
```bash
streamlit run app.py
```

Open your browser to the URL shown (typically http://localhost:8501)

**Quick Start Guide:** [QUICK_START.md](QUICK_START.md)

## How It Works

1. **Data Loading**: Medical questions and answers are loaded from Hugging Face datasets
2. **Embedding**: Texts are converted to embeddings using Sentence Transformers
3. **Vector Storage**: Embeddings are stored in Pinecone for fast similarity search
4. **Query Processing**: User queries are embedded and searched against the database
5. **Response Generation**: Gemini 1.5 Flash generates responses based on retrieved context
6. **Citation**: Sources are tracked and displayed with confidence scores

## Important Disclaimers

- ⚠️ **This is not medical advice**
- ⚠️ **Not a substitute for professional healthcare**
- ⚠️ **Always consult healthcare professionals for medical decisions**
- ⚠️ **Confidence scores indicate data quality, not medical accuracy**

## Configuration

Edit `config.py` to customize:
- Embedding model
- Number of retrieved documents (TOP_K)
- Similarity threshold
- Dataset selection

## Troubleshooting

### "API Key not found"
- Ensure your `.env` file exists and contains valid API keys

### "Index not found"
- Run `python setup_database.py` to create the Pinecone index

### "No results found"
- The similarity threshold might be too high
- Adjust `SIMILARITY_THRESHOLD` in `config.py`

## License

This project is for educational purposes only. Medical information should be verified with healthcare professionals.