Deva1211 commited on
Commit
37244c4
·
1 Parent(s): ba8e123

Modified files for deployment

Browse files
Files changed (7) hide show
  1. .gitignore +53 -0
  2. README.md +56 -6
  3. api_server.py +177 -0
  4. app.py +264 -36
  5. combined_app.py +36 -0
  6. config.py +84 -0
  7. requirements.txt +8 -4
.gitignore ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+ MANIFEST
23
+
24
+ # PyTorch
25
+ *.pth
26
+ *.pt
27
+ *.bin
28
+
29
+ # Jupyter Notebook
30
+ .ipynb_checkpoints
31
+
32
+ # Environment variables
33
+ .env
34
+ .venv
35
+ env/
36
+ venv/
37
+ ENV/
38
+ env.bak/
39
+ venv.bak/
40
+
41
+ # IDE
42
+ .vscode/
43
+ .idea/
44
+ *.swp
45
+ *.swo
46
+
47
+ # OS
48
+ .DS_Store
49
+ Thumbs.db
50
+
51
+ # Model cache
52
+ .cache/
53
+ huggingface_hub/
README.md CHANGED
@@ -1,12 +1,62 @@
1
  ---
2
- title: Medical Model
3
- emoji: 💬
4
- colorFrom: yellow
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.0.1
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MedLLaMA2 Medical Chatbot
3
+ emoji: 🏥
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  ---
12
 
13
+ # MedLLaMA2 Medical Chatbot 🏥
14
+
15
+ A medical AI assistant powered by MedLLaMA2 (7B parameters) designed to provide helpful medical information and guidance.
16
+
17
+ ## Features
18
+
19
+ - **Medical-focused responses**: Trained on medical literature and datasets
20
+ - **Safety-first approach**: Always recommends consulting healthcare professionals
21
+ - **Optimized for Hugging Face Spaces**: Uses 4-bit quantization for efficient memory usage
22
+ - **Interactive chat interface**: Built with Gradio for easy interaction
23
+
24
+ ## Usage
25
+
26
+ 1. Type your medical question or concern in the chat interface
27
+ 2. Adjust parameters like temperature and max tokens if needed
28
+ 3. The model will provide informative responses while emphasizing professional medical consultation
29
+
30
+ ## Important Disclaimer
31
+
32
+ ⚠️ **This chatbot is for educational and informational purposes only.**
33
+
34
+ - It should NOT be used as a substitute for professional medical advice
35
+ - Always consult with qualified healthcare professionals for medical concerns
36
+ - In case of medical emergencies, contact emergency services immediately
37
+
38
+ ## Technical Details
39
+
40
+ - **Model**: MedLLaMA2 7B (or compatible medical language model)
41
+ - **Framework**: Transformers, PyTorch
42
+ - **Interface**: Gradio ChatInterface
43
+ - **Optimization**: 4-bit quantization with BitsAndBytes
44
+ - **Hardware**: CPU Basic (16GB RAM)
45
+
46
+ ## Examples
47
+
48
+ Try asking questions like:
49
+ - "What are the symptoms of diabetes?"
50
+ - "How can I maintain a healthy heart?"
51
+ - "What should I know about blood pressure?"
52
+ - "Tell me about the importance of regular exercise."
53
+
54
+ ## Development
55
+
56
+ This space uses:
57
+ - Python 3.10+
58
+ - Transformers library for model loading
59
+ - Gradio for the web interface
60
+ - BitsAndBytes for model quantization
61
+
62
+ For more information about the underlying technology, see the [Transformers documentation](https://huggingface.co/docs/transformers/index).
api_server.py ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ API Server for MedLLaMA2 Medical Chatbot
3
+ This file provides REST API endpoints that can be used by external applications
4
+ while the main app.py provides the Gradio interface.
5
+ """
6
+
7
+ import os
8
+ import threading
9
+ from flask import Flask, request, jsonify, Response
10
+ from flask_cors import CORS
11
+ import json
12
+ import time
13
+ import re
14
+
15
+ # Import the model and functions from the main app
16
+ from app import load_model, generate_response, get_model_info
17
+ from config import GENERATION_DEFAULTS
18
+
19
+ # Initialize Flask app
20
+ app = Flask(__name__)
21
+ CORS(app) # Enable CORS for all routes
22
+
23
+ # Initialize model in a separate thread
24
+ def init_model():
25
+ print("🔄 Loading model in API server...")
26
+ load_model()
27
+ print("✅ Model loaded in API server")
28
+
29
+ # Start model loading
30
+ model_thread = threading.Thread(target=init_model)
31
+ model_thread.start()
32
+
33
+ @app.route('/health', methods=['GET'])
34
+ def health_check():
35
+ """Health check endpoint"""
36
+ return jsonify({
37
+ 'status': 'ok',
38
+ 'model_loaded': get_model_info() != "No model loaded",
39
+ 'model_info': get_model_info(),
40
+ 'timestamp': time.time()
41
+ })
42
+
43
+ @app.route('/chat', methods=['POST'])
44
+ def chat_endpoint():
45
+ """Main chat endpoint for medical questions"""
46
+ try:
47
+ data = request.get_json()
48
+
49
+ if not data or 'message' not in data:
50
+ return jsonify({'error': 'No message provided'}), 400
51
+
52
+ message = data['message'].strip()
53
+ if not message:
54
+ return jsonify({'error': 'Empty message'}), 400
55
+
56
+ # Get optional parameters
57
+ max_tokens = data.get('max_tokens', GENERATION_DEFAULTS['max_new_tokens'])
58
+ temperature = data.get('temperature', GENERATION_DEFAULTS['temperature'])
59
+ top_p = data.get('top_p', GENERATION_DEFAULTS['top_p'])
60
+
61
+ # Check for non-medical topics
62
+ non_medical_patterns = [
63
+ r'\b(java|javascript|python|c\+\+|c#|programming|coding|computer|software)\b',
64
+ r'\b(cook|recipe|food recipe|baking)\b',
65
+ r'\b(math problem|finance|stock market|weather|movie|book|travel)\b'
66
+ ]
67
+
68
+ is_non_medical = any(re.search(pattern, message, re.IGNORECASE) for pattern in non_medical_patterns)
69
+
70
+ # Medical exceptions
71
+ medical_exceptions = [
72
+ r'medical (history|coding|program|software|algorithm)',
73
+ r'health (history|software|recipe)',
74
+ r'(food allergy|diet recipe|patient story|medical story)'
75
+ ]
76
+
77
+ is_medical_exception = any(re.search(pattern, message, re.IGNORECASE) for pattern in medical_exceptions)
78
+
79
+ if is_non_medical and not is_medical_exception:
80
+ return jsonify({
81
+ 'response': "I'm a medical assistant designed to provide health-related information. I'm not able to help with programming, cooking, or other non-medical topics. If you have any questions about health, medicine, symptoms, or wellness, I'd be happy to assist you! 😊",
82
+ 'timestamp': time.time()
83
+ })
84
+
85
+ # Generate medical response
86
+ response = generate_response(
87
+ message,
88
+ max_tokens=int(max_tokens),
89
+ temperature=float(temperature),
90
+ top_p=float(top_p)
91
+ )
92
+
93
+ # Return the response
94
+ return jsonify({
95
+ 'response': response,
96
+ 'timestamp': time.time(),
97
+ 'model_info': get_model_info()
98
+ })
99
+
100
+ except Exception as e:
101
+ print(f"Error in chat endpoint: {str(e)}")
102
+ return jsonify({
103
+ 'error': 'Internal server error',
104
+ 'details': str(e)
105
+ }), 500
106
+
107
+ @app.route('/stream', methods=['POST'])
108
+ def stream_chat():
109
+ """Streaming chat endpoint"""
110
+ try:
111
+ data = request.get_json()
112
+
113
+ if not data or 'message' not in data:
114
+ return jsonify({'error': 'No message provided'}), 400
115
+
116
+ message = data['message'].strip()
117
+ if not message:
118
+ return jsonify({'error': 'Empty message'}), 400
119
+
120
+ def generate_stream():
121
+ try:
122
+ # Get parameters
123
+ max_tokens = data.get('max_tokens', GENERATION_DEFAULTS['max_new_tokens'])
124
+ temperature = data.get('temperature', GENERATION_DEFAULTS['temperature'])
125
+ top_p = data.get('top_p', GENERATION_DEFAULTS['top_p'])
126
+
127
+ # Generate response in chunks
128
+ response = generate_response(
129
+ message,
130
+ max_tokens=int(max_tokens),
131
+ temperature=float(temperature),
132
+ top_p=float(top_p)
133
+ )
134
+
135
+ # Stream the response word by word
136
+ words = response.split()
137
+ for i, word in enumerate(words):
138
+ chunk_data = {
139
+ 'chunk': word + (' ' if i < len(words) - 1 else ''),
140
+ 'status': 'streaming'
141
+ }
142
+ yield f"data: {json.dumps(chunk_data)}\n\n"
143
+ time.sleep(0.05) # Small delay for streaming effect
144
+
145
+ # Send completion signal
146
+ end_data = {
147
+ 'complete': True,
148
+ 'fullResponse': response
149
+ }
150
+ yield f"event: end\ndata: {json.dumps(end_data)}\n\n"
151
+
152
+ except Exception as e:
153
+ error_data = {
154
+ 'error': 'Stream error',
155
+ 'details': str(e)
156
+ }
157
+ yield f"event: error\ndata: {json.dumps(error_data)}\n\n"
158
+
159
+ return Response(
160
+ generate_stream(),
161
+ content_type='text/event-stream',
162
+ headers={
163
+ 'Cache-Control': 'no-cache',
164
+ 'Connection': 'keep-alive',
165
+ 'Access-Control-Allow-Origin': '*',
166
+ 'Access-Control-Allow-Headers': 'Content-Type, Authorization'
167
+ }
168
+ )
169
+
170
+ except Exception as e:
171
+ return jsonify({'error': str(e)}), 500
172
+
173
+ if __name__ == "__main__":
174
+ # For local development
175
+ port = int(os.environ.get("API_PORT", 8000))
176
+ print(f"🚀 Starting API server on port {port}")
177
+ app.run(host="0.0.0.0", port=port, debug=False)
app.py CHANGED
@@ -1,11 +1,162 @@
1
  import gradio as gr
2
- from huggingface_hub import InferenceClient
 
 
 
 
 
 
3
 
4
- """
5
- For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
6
- """
7
- client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  def respond(
11
  message,
@@ -15,50 +166,127 @@ def respond(
15
  temperature,
16
  top_p,
17
  ):
18
- messages = [{"role": "system", "content": system_message}]
19
-
20
- for val in history:
21
- if val[0]:
22
- messages.append({"role": "user", "content": val[0]})
23
- if val[1]:
24
- messages.append({"role": "assistant", "content": val[1]})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- messages.append({"role": "user", "content": message})
 
 
 
 
27
 
28
- response = ""
29
-
30
- for message in client.chat_completion(
31
- messages,
32
- max_tokens=max_tokens,
33
- stream=True,
34
- temperature=temperature,
35
- top_p=top_p,
36
- ):
37
- token = message.choices[0].delta.content
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- response += token
40
- yield response
 
 
41
 
 
 
 
 
42
 
43
- """
44
- For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
45
- """
46
  demo = gr.ChatInterface(
47
  respond,
 
 
48
  additional_inputs=[
49
- gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
50
- gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
51
- gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  gr.Slider(
53
- minimum=0.1,
54
- maximum=1.0,
55
- value=0.95,
56
  step=0.05,
57
- label="Top-p (nucleus sampling)",
58
  ),
59
  ],
 
 
 
 
60
  )
61
 
 
 
 
62
 
63
  if __name__ == "__main__":
64
- demo.launch()
 
1
  import gradio as gr
2
+ import torch
3
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
4
+ import logging
5
+ import gc
6
+ import warnings
7
+ import os
8
+ from config import MODEL_CONFIGS, DEFAULT_MODEL, MODEL_SETTINGS, GENERATION_DEFAULTS, MEDICAL_SYSTEM_PROMPT, UI_CONFIG
9
 
10
+ # Suppress warnings
11
+ warnings.filterwarnings("ignore")
12
+ logging.getLogger("transformers").setLevel(logging.ERROR)
 
13
 
14
+ # Global variables for model and tokenizer
15
+ model = None
16
+ tokenizer = None
17
+ current_model_name = None
18
+
19
+ def load_model(model_key=None):
20
+ """Load the specified medical model with optimizations for Hugging Face Spaces"""
21
+ global model, tokenizer, current_model_name
22
+
23
+ if model_key is None:
24
+ model_key = DEFAULT_MODEL
25
+
26
+ # Try to load models in order of preference
27
+ model_keys_to_try = [model_key, "meditron", "dialogpt_medium", "dialogpt_small"]
28
+
29
+ for key in model_keys_to_try:
30
+ if key not in MODEL_CONFIGS:
31
+ continue
32
+
33
+ try:
34
+ model_config = MODEL_CONFIGS[key]
35
+ model_name = model_config["name"]
36
+ print(f"Attempting to load model: {model_name} ({model_config['description']})")
37
+
38
+ # Load tokenizer first
39
+ print("Loading tokenizer...")
40
+ tokenizer = AutoTokenizer.from_pretrained(
41
+ model_name,
42
+ trust_remote_code=MODEL_SETTINGS["trust_remote_code"],
43
+ padding_side="left"
44
+ )
45
+
46
+ # Add pad token if it doesn't exist
47
+ if tokenizer.pad_token is None:
48
+ tokenizer.pad_token = tokenizer.eos_token
49
+
50
+ # Configure quantization for memory efficiency (only for larger models)
51
+ model_kwargs = {
52
+ "trust_remote_code": MODEL_SETTINGS["trust_remote_code"],
53
+ "low_cpu_mem_usage": MODEL_SETTINGS["low_cpu_mem_usage"]
54
+ }
55
+
56
+ # Add quantization for larger models
57
+ if MODEL_SETTINGS["use_quantization"] and key in ["medllama2", "meditron", "clinical_camel"]:
58
+ quantization_config = BitsAndBytesConfig(
59
+ load_in_4bit=True,
60
+ bnb_4bit_compute_dtype=torch.float16,
61
+ bnb_4bit_quant_type="nf4",
62
+ bnb_4bit_use_double_quant=True,
63
+ )
64
+ model_kwargs["quantization_config"] = quantization_config
65
+ model_kwargs["torch_dtype"] = torch.float16
66
+ model_kwargs["device_map"] = MODEL_SETTINGS["device_map"]
67
+ else:
68
+ # For smaller models, use regular loading
69
+ if torch.cuda.is_available():
70
+ model_kwargs["torch_dtype"] = torch.float16
71
+ model_kwargs["device_map"] = "auto"
72
+
73
+ print("Loading model...")
74
+ model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
75
+
76
+ current_model_name = model_name
77
+ print(f"✅ Model loaded successfully: {model_name}")
78
+ return True
79
+
80
+ except Exception as e:
81
+ print(f"❌ Failed to load {key}: {str(e)}")
82
+ # Clean up on failure
83
+ model = None
84
+ tokenizer = None
85
+ continue
86
+
87
+ print("❌ All model loading attempts failed")
88
+ return False
89
+
90
+ def generate_response(prompt, max_tokens=None, temperature=None, top_p=None):
91
+ """Generate response using the loaded model"""
92
+ global model, tokenizer, current_model_name
93
+
94
+ if model is None or tokenizer is None:
95
+ return "❌ Model not loaded. Please wait for initialization or try restarting the space."
96
+
97
+ # Use defaults if not specified
98
+ max_tokens = max_tokens or GENERATION_DEFAULTS["max_new_tokens"]
99
+ temperature = temperature or GENERATION_DEFAULTS["temperature"]
100
+ top_p = top_p or GENERATION_DEFAULTS["top_p"]
101
+
102
+ try:
103
+ # Use the medical system prompt
104
+ full_prompt = f"{MEDICAL_SYSTEM_PROMPT}\n\nPatient/User: {prompt}\nMedical Assistant:"
105
+
106
+ # Tokenize input with proper truncation
107
+ inputs = tokenizer(
108
+ full_prompt,
109
+ return_tensors="pt",
110
+ truncation=True,
111
+ max_length=1024,
112
+ padding=True
113
+ )
114
+
115
+ # Move to appropriate device
116
+ device = next(model.parameters()).device
117
+ inputs = {k: v.to(device) for k, v in inputs.items()}
118
+
119
+ # Generation parameters
120
+ generation_kwargs = {
121
+ "max_new_tokens": min(max_tokens, 1024), # Cap at 1024 for safety
122
+ "temperature": temperature,
123
+ "top_p": top_p,
124
+ "do_sample": GENERATION_DEFAULTS["do_sample"],
125
+ "pad_token_id": tokenizer.eos_token_id,
126
+ "repetition_penalty": GENERATION_DEFAULTS["repetition_penalty"],
127
+ "no_repeat_ngram_size": GENERATION_DEFAULTS["no_repeat_ngram_size"]
128
+ }
129
+
130
+ # Generate response
131
+ print(f"Generating response with {current_model_name}...")
132
+ with torch.no_grad():
133
+ outputs = model.generate(**inputs, **generation_kwargs)
134
+
135
+ # Decode response
136
+ full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
137
+
138
+ # Extract only the new generated text
139
+ if "Medical Assistant:" in full_response:
140
+ response = full_response.split("Medical Assistant:")[-1].strip()
141
+ else:
142
+ # Fallback extraction
143
+ response = full_response[len(full_prompt):].strip()
144
+
145
+ # Clean up response
146
+ if not response or len(response.strip()) < 10:
147
+ response = "I understand you're asking about a medical topic. While I'd like to help, I recommend consulting with a qualified healthcare professional who can provide personalized advice based on your specific situation."
148
+
149
+ # Clean up memory
150
+ del inputs, outputs
151
+ if torch.cuda.is_available():
152
+ torch.cuda.empty_cache()
153
+
154
+ return response
155
+
156
+ except Exception as e:
157
+ error_msg = f"Error generating response: {str(e)}"
158
+ print(error_msg)
159
+ return f"⚠️ I encountered a technical issue while processing your request. Please try again or rephrase your question. If the problem persists, consider consulting a healthcare professional directly."
160
 
161
  def respond(
162
  message,
 
166
  temperature,
167
  top_p,
168
  ):
169
+ """Main response function for Gradio ChatInterface"""
170
+ if not message or not message.strip():
171
+ return "Please enter a medical question or concern."
172
+
173
+ # Add a disclaimer for first-time users
174
+ disclaimer = "\n\n⚠️ **Medical Disclaimer**: This AI provides general health information only. Always consult healthcare professionals for medical advice, diagnosis, or treatment."
175
+
176
+ try:
177
+ # Generate response
178
+ response = generate_response(
179
+ message.strip(),
180
+ max_tokens=int(max_tokens),
181
+ temperature=float(temperature),
182
+ top_p=float(top_p)
183
+ )
184
+
185
+ # Add disclaimer to response
186
+ if "disclaimer" not in response.lower() and "consult" not in response.lower():
187
+ response += disclaimer
188
+
189
+ return response
190
+
191
+ except Exception as e:
192
+ error_msg = f"System error: {str(e)}"
193
+ print(error_msg)
194
+ return f"⚠️ System temporarily unavailable. Please try again later or consult a healthcare professional directly.{disclaimer}"
195
 
196
+ def get_model_info():
197
+ """Get information about the currently loaded model"""
198
+ if current_model_name:
199
+ return f"Currently using: {current_model_name}"
200
+ return "No model loaded"
201
 
202
+ def respond(
203
+ message,
204
+ history: list[tuple[str, str]],
205
+ system_message,
206
+ max_tokens,
207
+ temperature,
208
+ top_p,
209
+ ):
210
+ """Main response function for Gradio ChatInterface"""
211
+ if not message or not message.strip():
212
+ return "Please enter a medical question or concern."
213
+
214
+ # Add a disclaimer for first-time users
215
+ disclaimer = "\n\n⚠️ **Medical Disclaimer**: This AI provides general health information only. Always consult healthcare professionals for medical advice, diagnosis, or treatment."
216
+
217
+ try:
218
+ # Generate response
219
+ response = generate_response(
220
+ message.strip(),
221
+ max_tokens=int(max_tokens),
222
+ temperature=float(temperature),
223
+ top_p=float(top_p)
224
+ )
225
+
226
+ # Add disclaimer to response
227
+ if "disclaimer" not in response.lower() and "consult" not in response.lower():
228
+ response += disclaimer
229
+
230
+ return response
231
+
232
+ except Exception as e:
233
+ error_msg = f"System error: {str(e)}"
234
+ print(error_msg)
235
+ return f"⚠️ System temporarily unavailable. Please try again later or consult a healthcare professional directly.{disclaimer}"
236
 
237
+ # Load model on startup
238
+ print("🏥 Initializing MedLLaMA2 Medical Chatbot...")
239
+ print("📋 Loading medical language model...")
240
+ model_loaded = load_model()
241
 
242
+ if model_loaded:
243
+ print(f"✅ Ready! {get_model_info()}")
244
+ else:
245
+ print("⚠️ WARNING: Model failed to load. The app will run but responses may be limited.")
246
 
247
+ # Create Gradio interface with configuration
 
 
248
  demo = gr.ChatInterface(
249
  respond,
250
+ title=UI_CONFIG["title"],
251
+ description=UI_CONFIG["description"],
252
  additional_inputs=[
253
+ gr.Textbox(
254
+ value=MEDICAL_SYSTEM_PROMPT,
255
+ label="System Instructions",
256
+ lines=4,
257
+ interactive=False # Make it read-only to prevent tampering
258
+ ),
259
+ gr.Slider(
260
+ minimum=UI_CONFIG["max_tokens_range"][0],
261
+ maximum=UI_CONFIG["max_tokens_range"][1],
262
+ value=GENERATION_DEFAULTS["max_new_tokens"],
263
+ step=10,
264
+ label="Max new tokens"
265
+ ),
266
+ gr.Slider(
267
+ minimum=UI_CONFIG["temperature_range"][0],
268
+ maximum=UI_CONFIG["temperature_range"][1],
269
+ value=GENERATION_DEFAULTS["temperature"],
270
+ step=0.1,
271
+ label="Temperature (creativity)"
272
+ ),
273
  gr.Slider(
274
+ minimum=UI_CONFIG["top_p_range"][0],
275
+ maximum=UI_CONFIG["top_p_range"][1],
276
+ value=GENERATION_DEFAULTS["top_p"],
277
  step=0.05,
278
+ label="Top-p (focus)",
279
  ),
280
  ],
281
+ examples=[[example] for example in UI_CONFIG["examples"]],
282
+ cache_examples=False,
283
+ theme=gr.themes.Soft(),
284
+ css=".gradio-container {max-width: 900px; margin: auto;}"
285
  )
286
 
287
+ # Add model info to the interface
288
+ with demo:
289
+ gr.HTML(f"<p style='text-align: center; color: #666; font-size: 0.9em;'>Model Status: {get_model_info()}</p>")
290
 
291
  if __name__ == "__main__":
292
+ demo.launch(server_name="0.0.0.0", server_port=7860, show_error=True)
combined_app.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Combined App for MedLLaMA2 Medical Chatbot
3
+ This file runs both the Gradio interface and Flask API server
4
+ """
5
+
6
+ import threading
7
+ import time
8
+ import subprocess
9
+ import sys
10
+ import os
11
+
12
+ def run_gradio():
13
+ """Run the Gradio app"""
14
+ subprocess.run([sys.executable, "app.py"])
15
+
16
+ def run_api():
17
+ """Run the API server"""
18
+ # Wait a bit for the model to load in Gradio
19
+ time.sleep(10)
20
+ subprocess.run([sys.executable, "api_server.py"])
21
+
22
+ if __name__ == "__main__":
23
+ print("🚀 Starting MedLLaMA2 Combined Server...")
24
+ print("📊 This will start both Gradio UI (port 7860) and API server (port 8000)")
25
+
26
+ # Create threads for both servers
27
+ gradio_thread = threading.Thread(target=run_gradio)
28
+ api_thread = threading.Thread(target=run_api)
29
+
30
+ # Start both threads
31
+ gradio_thread.start()
32
+ api_thread.start()
33
+
34
+ # Wait for both to complete
35
+ gradio_thread.join()
36
+ api_thread.join()
config.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration file for MedLLaMA2 model hosting
2
+
3
+ # Model configurations
4
+ MODEL_CONFIGS = {
5
+ # Primary medical models (replace with actual MedLLaMA2 when available)
6
+ "medllama2": {
7
+ "name": "medllama2:latest", # Replace with actual MedLLaMA2 model ID
8
+ "description": "MedLLaMA2 7B medical language model"
9
+ },
10
+
11
+ # Alternative medical models
12
+ "meditron": {
13
+ "name": "epfl-llm/meditron-7b",
14
+ "description": "Meditron 7B medical language model"
15
+ },
16
+
17
+ "clinical_camel": {
18
+ "name": "wanglab/ClinicalCamel-70B", # Note: This is very large, might not fit
19
+ "description": "Clinical Camel medical model"
20
+ },
21
+
22
+ # Fallback models (smaller, more reliable)
23
+ "dialogpt_medium": {
24
+ "name": "microsoft/DialoGPT-medium",
25
+ "description": "DialoGPT Medium (fallback)"
26
+ },
27
+
28
+ "dialogpt_small": {
29
+ "name": "microsoft/DialoGPT-small",
30
+ "description": "DialoGPT Small (lightweight fallback)"
31
+ }
32
+ }
33
+
34
+ # Default model to use
35
+ DEFAULT_MODEL = "medllama2"
36
+
37
+ # Model loading settings
38
+ MODEL_SETTINGS = {
39
+ "use_quantization": True,
40
+ "quantization_bits": 4,
41
+ "torch_dtype": "float16",
42
+ "trust_remote_code": True,
43
+ "low_cpu_mem_usage": True,
44
+ "device_map": "auto"
45
+ }
46
+
47
+ # Generation settings
48
+ GENERATION_DEFAULTS = {
49
+ "max_new_tokens": 512,
50
+ "temperature": 0.7,
51
+ "top_p": 0.9,
52
+ "do_sample": True,
53
+ "repetition_penalty": 1.1,
54
+ "no_repeat_ngram_size": 3
55
+ }
56
+
57
+ # Medical prompt template
58
+ MEDICAL_SYSTEM_PROMPT = """You are a helpful medical AI assistant designed to provide accurate medical information and guidance.
59
+
60
+ Key guidelines:
61
+ 1. Provide factual, evidence-based medical information
62
+ 2. Always emphasize the importance of consulting healthcare professionals
63
+ 3. Never provide specific diagnoses or treatment recommendations
64
+ 4. Encourage users to seek immediate medical attention for serious symptoms
65
+ 5. Be empathetic and supportive while maintaining medical accuracy
66
+
67
+ Remember: This information is for educational purposes only and should not replace professional medical advice."""
68
+
69
+ # UI settings
70
+ UI_CONFIG = {
71
+ "title": "🏥 MedLLaMA2 Medical Chatbot",
72
+ "description": "A medical AI assistant powered by MedLLaMA2. Please note: This is for educational purposes only and should not replace professional medical advice.",
73
+ "examples": [
74
+ "What are the symptoms of diabetes?",
75
+ "How can I maintain a healthy heart?",
76
+ "What should I know about blood pressure?",
77
+ "Tell me about the importance of regular exercise.",
78
+ "What are the side effects of common pain medications?",
79
+ "How can I improve my sleep quality?"
80
+ ],
81
+ "max_tokens_range": (50, 1024),
82
+ "temperature_range": (0.1, 1.0),
83
+ "top_p_range": (0.1, 1.0)
84
+ }
requirements.txt CHANGED
@@ -1,4 +1,8 @@
1
- huggingface_hub==0.25.2
2
- transformers
3
- torch
4
- accelerate
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ transformers>=4.35.0
3
+ torch>=2.0.0
4
+ accelerate>=0.20.0
5
+ bitsandbytes>=0.41.0
6
+ sentencepiece>=0.1.99
7
+ protobuf>=3.20.0
8
+ huggingface_hub>=0.17.0