Abid Ali Awan commited on
Commit
abfac44
·
1 Parent(s): c92ea65

refactor: Remove legacy files and consolidate functionality into a streamlined Gradio application with enhanced MCP client integration, improved file handling, and updated AI orchestration for better user experience.

Browse files
Files changed (6) hide show
  1. README.md +0 -193
  2. ai_orchestrator.py +0 -303
  3. app.py +238 -310
  4. config.py +0 -87
  5. file_handler.py +0 -296
  6. mcp_client.py +0 -200
README.md CHANGED
@@ -13,196 +13,3 @@ tags:
13
  - mcp-in-action-track-customer
14
  ---
15
 
16
-
17
- # AI-Driven MLOps Agent 🤖
18
-
19
- [![Hugging Face](https://img.shields.io/badge/🤗-HuggingFace-yellow?style=flat-square&logo=huggingface)](https://huggingface.co)
20
- [![Python](https://img.shields.io/badge/Python-3.8+-blue?style=flat-square&logo=python)](https://python.org)
21
- [![License](https://img.shields.io/badge/License-Apache%202.0-green?style=flat-square)](LICENSE)
22
- [![OpenAI](https://img.shields.io/badge/OpenAI-GPT--4o--mini-red?style=flat-square&logo=openai)](https://openai.com)
23
-
24
- > Chat interface for automated ML pipeline operations - Analyze, train, deploy, and test API, all in seconds.
25
-
26
- ## 🚀 Overview
27
-
28
- AI-Driven MLOps Agent is a intelligent chat-based application that automates machine learning workflows using Hugging Face's MCP (Model Context Protocol) servers. This agent can analyze datasets, train models, deploy them to production, and provide comprehensive insights - all through natural language conversations.
29
-
30
- ## ✨ Features
31
-
32
- - **🤖 Intelligent Chat Interface**: Natural language interaction with AI-powered ML operations
33
- - **📊 Automated Data Analysis**: Instant CSV analysis with data quality insights
34
- - **🎯 Smart Intent Recognition**: AI understands your ML goals and selects appropriate tools
35
- - **🔧 MCP Integration**: Seamlessly connects to Hugging Face's MCP servers for ML operations
36
- - **📁 File Upload & Processing**: Automatic CSV file handling with URL generation
37
- - **🔄 Streaming Responses**: Real-time feedback during long-running operations
38
- - **🎨 Modern UI**: Clean, intuitive Gradio interface with responsive design
39
-
40
- ## 🛠️ Available Tools
41
-
42
- The agent integrates with these powerful MCP tools:
43
-
44
- - **📊 Data Analysis**: Comprehensive dataset analysis and insights
45
- - **🚈 Model Training**: Automated ML model training with optimized parameters
46
- - **🚀 Model Deployment**: One-click deployment to production APIs
47
- - **⚡ Auto-Deploy Pipeline**: End-to-end ML workflow automation
48
-
49
- ## 📋 Prerequisites
50
-
51
- - Python 3.8 or higher
52
- - OpenAI API key (set as `OPENAI_API_KEY` environment variable)
53
- - Access to Hugging Face's MCP server
54
-
55
- ## 🔧 Installation
56
-
57
- 1. **Clone the repository**:
58
- ```bash
59
- git clone <repository-url>
60
- cd mlops-agent
61
- ```
62
-
63
- 2. **Install dependencies**:
64
- ```bash
65
- pip install -r requirements.txt
66
- ```
67
-
68
- 3. **Set up environment variables**:
69
- ```bash
70
- export OPENAI_API_KEY="your-openai-api-key-here"
71
- ```
72
-
73
- 4. **Run the application**:
74
- ```bash
75
- python app.py
76
- ```
77
-
78
- ## 🎯 Usage
79
-
80
- ### Quick Start
81
-
82
- 1. **Upload a CSV file** using the file upload component
83
- 2. **Start chatting** with the AI about your ML goals:
84
- - "Analyze my data"
85
- - "Train a classification model"
86
- - "Deploy this to production"
87
- - "Run the complete pipeline"
88
-
89
- ### Example Interactions
90
-
91
- #### Data Analysis
92
- ```
93
- User: "Can you analyze my dataset?"
94
- Agent: "I'll analyze your CSV file and provide insights about data quality, patterns, and recommendations."
95
- ```
96
-
97
- #### Model Training
98
- ```
99
- User: "I want to train a model to predict customer churn"
100
- Agent: "I'll use your uploaded data to train a classification model optimized for churn prediction."
101
- ```
102
-
103
- #### Auto-Deploy
104
- ```
105
- User: "Handle the complete ML pipeline for me"
106
- Agent: "I'll analyze your data, train an optimized model, and deploy it to a live API endpoint."
107
- ```
108
-
109
- ## 🏗️ Architecture
110
-
111
- ```
112
- mlops-agent/
113
- ├── app.py # Main Gradio application
114
- ├── mcp_client.py # MCP server connection and tool execution
115
- ├── ai_orchestrator.py # GPT-5-mini integration and reasoning
116
- ├── file_handler.py # File upload and URL generation
117
- ├── config.py # Configuration and environment variables
118
- ├── requirements.txt # Python dependencies
119
- └── README.md # Project documentation
120
- ```
121
-
122
- ### Core Components
123
-
124
- - **MCPClientManager**: Handles connections to Hugging Face MCP servers
125
- - **AIOrchestrator**: Uses GPT-5-mini for intent recognition and response generation
126
- - **FileHandler**: Processes CSV uploads and generates accessible URLs
127
- - **ChatbaseMCPApp**: Main application orchestrating all components
128
-
129
- ## ⚙️ Configuration
130
-
131
- ### Environment Variables
132
-
133
- | Variable | Description | Required |
134
- |----------|-------------|----------|
135
- | `OPENAI_API_KEY` | Your OpenAI API key | Yes |
136
-
137
- ### Configuration Options
138
-
139
- The `config.py` file contains customizable parameters:
140
-
141
- - **MCP_SERVER_URL**: MCP server endpoint (default: Hugging Face Auto-Deployer)
142
- - **OPENAI_MODEL**: OpenAI model to use (default: gpt-5-mini)
143
- - **ALLOWED_FILE_TYPES**: Supported file types (default: CSV only)
144
- - **MAX_FILE_SIZE**: Maximum upload size (default: 100MB)
145
-
146
- ## 🔒 Security Features
147
-
148
- - **Input Validation**: File type and size restrictions
149
- - **API Key Protection**: Secure environment variable handling
150
- - **Safe URL Generation**: Secure file access mechanisms
151
- - **Error Handling**: Graceful failure handling with user-friendly messages
152
-
153
- ## 🚀 Deployment
154
-
155
- ### Local Development
156
-
157
- ```bash
158
- python app.py
159
- ```
160
-
161
- The application will start on `http://localhost:7860`
162
-
163
- ### Hugging Face Spaces
164
-
165
- This application is designed for deployment on Hugging Face Spaces:
166
-
167
- 1. Set `OPENAI_API_KEY` in Space secrets
168
- 2. Use the included `README.md` configuration
169
- 3. Deploy automatically
170
-
171
- ## 🧪 Testing
172
-
173
- ### Manual Testing
174
-
175
- 1. Upload a sample CSV file
176
- 2. Test various intents:
177
- - "analyze my data"
178
- - "train a model"
179
- - "deploy this"
180
- - "auto deploy the pipeline"
181
-
182
- ### Sample CSV for Testing
183
-
184
- ```csv
185
- age,income,education,target
186
- 25,50000,Bachelor,0
187
- 35,75000,Master,1
188
- 45,90000,PhD,0
189
- 55,120000,Bachelor,1
190
- ```
191
-
192
- ## 🐛 Troubleshooting
193
-
194
- ### Common Issues
195
-
196
- 1. **OpenAI API Key Error**:
197
- - Ensure `OPENAI_API_KEY` is set correctly
198
- - Check API key permissions and billing
199
-
200
- 2. **MCP Connection Issues**:
201
- - Verify internet connection
202
- - Check Hugging Face MCP server status
203
-
204
- 3. **File Upload Problems**:
205
- - Ensure file is CSV format
206
- - Check file size limit (100MB)
207
-
208
-
 
13
  - mcp-in-action-track-customer
14
  ---
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ai_orchestrator.py DELETED
@@ -1,303 +0,0 @@
1
- """
2
- AI Orchestrator using GPT-5-mini for intent recognition, tool selection, and response generation
3
- Integrates with MCP client to provide intelligent tool execution
4
- """
5
- import asyncio
6
- import json
7
- import logging
8
- import re
9
- from typing import Dict, Any, List, Optional, Tuple
10
- from openai import AsyncOpenAI
11
- from mcp_client import MCPClient
12
-
13
- # Configure logging
14
- logging.basicConfig(level=logging.INFO)
15
- logger = logging.getLogger(__name__)
16
-
17
- class AIOrchestrator:
18
- """AI-powered orchestrator for tool selection and execution"""
19
-
20
- def __init__(self, openai_api_key: str, model: str = "gpt-5-mini"):
21
- self.client = AsyncOpenAI(api_key=openai_api_key)
22
- self.model = model
23
- self.mcp_client: Optional[MCPClient] = None
24
- self.conversation_history: List[Dict[str, str]] = []
25
- self.file_context: Dict[str, Any] = {}
26
-
27
- async def initialize(self):
28
- """Initialize the orchestrator and connect to MCP client"""
29
- try:
30
- self.mcp_client = MCPClient()
31
- success = await self.mcp_client.connect()
32
- if not success:
33
- raise RuntimeError("Failed to connect to MCP client")
34
- logger.info("AI Orchestrator initialized successfully")
35
- except Exception as e:
36
- logger.error(f"Failed to initialize AI Orchestrator: {str(e)}")
37
- raise
38
-
39
- async def process_message(
40
- self,
41
- message: str,
42
- file_url: Optional[str] = None,
43
- stream: bool = False
44
- ) -> Tuple[str, List[str]]:
45
- """Process user message and return response with tool execution steps"""
46
- try:
47
- # Add file context if available
48
- if file_url:
49
- self.file_context['file_url'] = file_url
50
- self.file_context['has_file'] = True
51
- else:
52
- self.file_context['has_file'] = False
53
-
54
- # Add user message to conversation history
55
- self.conversation_history.append({
56
- "role": "user",
57
- "content": message
58
- })
59
-
60
- # Get available tools from MCP client
61
- available_tools = self.mcp_client.get_available_tools()
62
-
63
- # Analyze user intent and select tools
64
- tool_plan = await self._analyze_intent_and_plan_tools(message, available_tools)
65
-
66
- if not tool_plan['tools']:
67
- # No tools needed, respond directly
68
- response = await self._generate_direct_response(message)
69
- return response, []
70
-
71
- # Execute tools and generate response
72
- execution_steps = []
73
- final_response = await self._execute_tools_and_respond(
74
- tool_plan, message, execution_steps, stream
75
- )
76
-
77
- return final_response, execution_steps
78
-
79
- except Exception as e:
80
- logger.error(f"Error processing message: {str(e)}")
81
- error_response = f"❌ An error occurred: {str(e)}\n\nPlease try again or rephrase your request."
82
- return error_response, []
83
-
84
- async def _analyze_intent_and_plan_tools(
85
- self,
86
- message: str,
87
- available_tools: Dict[str, str]
88
- ) -> Dict[str, Any]:
89
- """Analyze user intent and create tool execution plan"""
90
-
91
- system_prompt = """You are an AI assistant that analyzes user requests and determines which ML tools to use.
92
-
93
- Available tools:
94
- 1. Auto_Deployer_analyze_data_tool - Analyze CSV datasets and provide statistical metadata
95
- Parameters: file_path (URL to CSV file)
96
-
97
- 2. Auto_Deployer_train_model_tool - Train ML models on CSV data
98
- Parameters: file_path (URL to CSV file), target_column (string), task_type (classification/regression/time_series)
99
-
100
- 3. Auto_Deployer_deploy_model_tool - Deploy trained models to production API
101
- Parameters: model_id (string from training result)
102
-
103
- 4. Auto_Deployer_auto_deploy_tool - Complete ML pipeline (analyze → train → deploy)
104
- Parameters: file_path (URL to CSV file), target_column (string), task_type (classification/regression/time_series)
105
-
106
- File context: {file_context}
107
-
108
- Analyze the user's message and determine:
109
- 1. What the user wants to accomplish
110
- 2. Which tools are needed
111
- 3. What parameters are required
112
- 4. The order of tool execution
113
-
114
- Respond with JSON format:
115
- {{
116
- "intent": "brief description of user's goal",
117
- "tools": [
118
- {{
119
- "tool_name": "tool name",
120
- "parameters": {{
121
- "param1": "value1",
122
- "param2": "value2"
123
- }},
124
- "reasoning": "why this tool is needed"
125
- }}
126
- ],
127
- "missing_info": ["list of missing required information if any"]
128
- }}
129
-
130
- If no tools are needed, set tools to empty array and provide helpful response directly."""
131
-
132
- user_prompt = f"""User message: "{message}"
133
-
134
- File context: {json.dumps(self.file_context, indent=2)}"""
135
-
136
- try:
137
- response = await self.client.chat.completions.create(
138
- model=self.model,
139
- messages=[
140
- {"role": "system", "content": system_prompt},
141
- {"role": "user", "content": user_prompt}
142
- ],
143
- temperature=0.1,
144
- max_tokens=1000
145
- )
146
-
147
- content = response.choices[0].message.content
148
-
149
- # Extract JSON from response
150
- json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
151
- if json_match:
152
- tool_plan = json.loads(json_match.group(1))
153
- else:
154
- # Try to parse JSON directly
155
- tool_plan = json.loads(content)
156
-
157
- logger.info(f"Tool plan generated: {tool_plan}")
158
- return tool_plan
159
-
160
- except Exception as e:
161
- logger.error(f"Error analyzing intent: {str(e)}")
162
- return {"intent": "error", "tools": [], "missing_info": []}
163
-
164
- async def _execute_tools_and_respond(
165
- self,
166
- tool_plan: Dict[str, Any],
167
- original_message: str,
168
- execution_steps: List[str],
169
- stream: bool
170
- ) -> str:
171
- """Execute tools according to plan and generate response"""
172
-
173
- if tool_plan.get('missing_info'):
174
- # Ask for missing information
175
- missing_info = "\n".join([f"• {info}" for info in tool_plan['missing_info']])
176
- return f"I need some additional information to help you:\n\n{missing_info}\n\nPlease provide the missing details and I'll proceed with your request."
177
-
178
- response_parts = []
179
- results = {}
180
-
181
- # Execute each tool in sequence
182
- for i, tool_info in enumerate(tool_plan['tools']):
183
- tool_name = tool_info['tool_name']
184
- parameters = tool_info['parameters']
185
- reasoning = tool_info.get('reasoning', '')
186
-
187
- # Add reasoning to response
188
- if reasoning:
189
- response_parts.append(f"🤔 **Reasoning**: {reasoning}\n")
190
-
191
- # Add execution step
192
- step_info = f"Step {i+1}: Executing {tool_name}"
193
- execution_steps.append(step_info)
194
-
195
- response_parts.append(f"🔄 **{step_info}**\n")
196
-
197
- try:
198
- # Add file URL if available and not already in parameters
199
- if (self.file_context.get('has_file') and
200
- 'file_path' not in parameters and
201
- 'file_path' in str(parameters)):
202
- parameters['file_path'] = self.file_context['file_url']
203
-
204
- # Execute the tool
205
- result = await self.mcp_client.call_tool(tool_name, parameters)
206
- results[tool_name] = result
207
-
208
- response_parts.append("✅ **Tool execution completed successfully**\n")
209
-
210
- # Store model_id from training result for potential deployment
211
- if tool_name == "Auto_Deployer_train_model_tool" and isinstance(result, dict):
212
- if 'model_id' in result:
213
- self.file_context['model_id'] = result['model_id']
214
- elif 'data' in result and isinstance(result['data'], dict) and 'model_id' in result['data']:
215
- self.file_context['model_id'] = result['data']['model_id']
216
-
217
- except Exception as e:
218
- error_msg = f"❌ Error executing {tool_name}: {str(e)}"
219
- response_parts.append(error_msg)
220
- logger.error(f"Tool execution error: {str(e)}")
221
-
222
- # Generate final response with results
223
- if results:
224
- response_parts.append("\n📊 **Results:**\n")
225
- for tool_name, result in results.items():
226
- response_parts.append(f"\n**{tool_name} Output:**\n")
227
- response_parts.append(f"```json\n{json.dumps(result, indent=2, default=str)}\n```\n")
228
-
229
- # Add follow-up suggestions
230
- if len(results) > 0:
231
- last_tool = tool_plan['tools'][-1]['tool_name']
232
- if last_tool == "Auto_Deployer_analyze_data_tool":
233
- response_parts.append("\n💡 **Next Steps:**\n")
234
- response_parts.append("• I can help you train a machine learning model on this data\n")
235
- response_parts.append("• Just specify which column you'd like to predict and the type of task\n")
236
- elif last_tool == "Auto_Deployer_train_model_tool":
237
- response_parts.append("\n💡 **Next Steps:**\n")
238
- response_parts.append("• I can deploy this model to a production API endpoint\n")
239
- response_parts.append("• Just ask me to deploy the trained model\n")
240
-
241
- return "".join(response_parts)
242
-
243
- async def _generate_direct_response(self, message: str) -> str:
244
- """Generate direct response without using tools"""
245
-
246
- system_prompt = """You are a helpful AI assistant for machine learning and data science tasks.
247
- You have access to ML tools but the current conversation doesn't require tool usage.
248
-
249
- Respond helpfully about:
250
- - ML concepts and best practices
251
- - Data analysis suggestions
252
- - Model selection guidance
253
- - General ML workflow advice
254
-
255
- Be concise but informative."""
256
-
257
- try:
258
- response = await self.client.chat.completions.create(
259
- model=self.model,
260
- messages=[
261
- {"role": "system", "content": system_prompt},
262
- {"role": "user", "content": message}
263
- ],
264
- temperature=0.7,
265
- max_tokens=500
266
- )
267
-
268
- assistant_response = response.choices[0].message.content
269
-
270
- # Add assistant response to conversation history
271
- self.conversation_history.append({
272
- "role": "assistant",
273
- "content": assistant_response
274
- })
275
-
276
- return assistant_response
277
-
278
- except Exception as e:
279
- logger.error(f"Error generating direct response: {str(e)}")
280
- return "I apologize, but I'm having trouble generating a response. Please try again."
281
-
282
- def get_tool_descriptions(self) -> str:
283
- """Get formatted descriptions of available tools"""
284
- if not self.mcp_client:
285
- return "Tools not available"
286
-
287
- descriptions = ["**Available ML Tools:**\n"]
288
-
289
- for tool_name, description in self.mcp_client.get_available_tools().items():
290
- descriptions.append(f"• **{tool_name}**: {description}\n")
291
-
292
- return "".join(descriptions)
293
-
294
- def clear_conversation_history(self):
295
- """Clear conversation history"""
296
- self.conversation_history = []
297
- self.file_context = {}
298
- logger.info("Conversation history cleared")
299
-
300
- async def __del__(self):
301
- """Cleanup when orchestrator is destroyed"""
302
- if self.mcp_client:
303
- await self.mcp_client.disconnect()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -1,324 +1,252 @@
1
  """
2
- Gradio Interface for Chatbase MCP Client
3
- Provides web-based chat interface with file upload and tool execution
4
  """
 
5
  import os
6
- import asyncio
7
- import logging
8
- from typing import Dict, Any, List, Tuple, Optional, Generator
9
  import gradio as gr
10
- from ai_orchestrator import AIOrchestrator
11
- from file_handler import FileHandler
12
- import tempfile
13
-
14
- # Configure logging
15
- logging.basicConfig(level=logging.INFO)
16
- logger = logging.getLogger(__name__)
17
-
18
- class ChatbaseGradioApp:
19
- """Main application class for the Chatbase Gradio interface"""
20
-
21
- def __init__(self):
22
- self.orchestrator: Optional[AIOrchestrator] = None
23
- self.file_handler = FileHandler()
24
- self.current_file_path: Optional[str] = None
25
- self.current_file_analysis: Optional[Dict[str, Any]] = None
26
- self.public_file_url: Optional[str] = None
27
-
28
- async def initialize_app(self):
29
- """Initialize the application components"""
30
-
31
- # Get OpenAI API key from environment
32
- api_key = os.getenv("OPENAI_API_KEY")
33
- if not api_key:
34
- raise ValueError("OPENAI_API_KEY environment variable is required")
35
-
36
- # Initialize AI Orchestrator
37
- self.orchestrator = AIOrchestrator(openai_api_key=api_key)
38
- await self.orchestrator.initialize()
39
-
40
- logger.info("Application initialized successfully")
41
-
42
- def handle_file_upload(self, file_obj) -> Tuple[str, str]:
43
- """Handle file upload and generate file analysis"""
44
-
45
- if file_obj is None:
46
- return "No file uploaded", ""
47
-
48
- try:
49
- # Run async file handling
50
- loop = asyncio.new_event_loop()
51
- asyncio.set_event_loop(loop)
52
-
53
- file_path, analysis = loop.run_until_complete(
54
- self.file_handler.save_upload_file(file_obj, os.path.basename(file_obj.name))
55
- )
56
-
57
- self.current_file_path = file_path
58
-
59
- # Generate public URL (for demo purposes, use file path)
60
- # In production, this would upload to a public service
61
- self.public_file_url = f"file://{file_path}"
62
-
63
- # Format file analysis for display
64
- file_summary = self.file_handler.format_file_summary(analysis)
65
-
66
- loop.close()
67
-
68
- return f"✅ File uploaded successfully: {os.path.basename(file_path)}", file_summary
69
-
70
- except Exception as e:
71
- logger.error(f"Error handling file upload: {str(e)}")
72
- return f"❌ Error uploading file: {str(e)}", ""
73
-
74
- def handle_message(
75
- self,
76
- message: str,
77
- history: List[Tuple[str, str]]
78
- ) -> Generator[str, None, None]:
79
- """Handle user message with streaming response"""
80
-
81
- if not message.strip():
82
- yield ""
83
- return
84
-
85
- if not self.orchestrator:
86
- yield "❌ AI Orchestrator not initialized. Please check configuration."
87
- return
88
-
89
- try:
90
- # Create new event loop for async processing
91
- loop = asyncio.new_event_loop()
92
- asyncio.set_event_loop(loop)
93
-
94
- # Start with initial message
95
- yield "🤔 Analyzing your request..."
96
-
97
- # Process the message with the orchestrator
98
- response, execution_steps = loop.run_until_complete(
99
- self.orchestrator.process_message(
100
- message=message,
101
- file_url=self.public_file_url,
102
- stream=True
103
- )
104
- )
105
-
106
- # Show execution steps if any
107
- if execution_steps:
108
- steps_text = "\n🔄 **Execution Steps:**\n" + "\n".join([f"• {step}" for step in execution_steps]) + "\n\n"
109
- yield steps_text
110
-
111
- # Show final response
112
- yield response
113
-
114
- loop.close()
115
-
116
- except Exception as e:
117
- logger.error(f"Error processing message: {str(e)}")
118
- yield f"❌ An error occurred: {str(e)}\n\nPlease try again or rephrase your request."
119
-
120
- def clear_conversation(self):
121
- """Clear conversation history and reset state"""
122
-
123
- try:
124
- if self.orchestrator:
125
- self.orchestrator.clear_conversation_history()
126
-
127
- # Clear file state
128
- self.current_file_path = None
129
- self.current_file_analysis = None
130
- self.public_file_url = None
131
-
132
- # Clean up files
133
- self.file_handler.cleanup_all_files()
134
-
135
- return [], "Conversation cleared. Ready for new chat!", ""
136
-
137
- except Exception as e:
138
- logger.error(f"Error clearing conversation: {str(e)}")
139
- return [], f"Error clearing conversation: {str(e)}", ""
140
-
141
- def get_system_info(self) -> str:
142
- """Get system information and available tools"""
143
-
144
- if not self.orchestrator:
145
- return "System not initialized"
146
-
147
- try:
148
- tools_info = self.orchestrator.get_tool_descriptions()
149
-
150
- system_info = """
151
- 🤖 **Chatbase ML Assistant**
152
-
153
- I can help you with machine learning tasks including:
154
-
155
- 📊 **Data Analysis:** Analyze CSV datasets and get statistical insights
156
- 🤖 **Model Training:** Train ML models for classification, regression, or time series
157
- 🚀 **Model Deployment:** Deploy trained models to production APIs
158
- ⚡ **Complete Pipeline:** End-to-end ML workflow from data to deployed model
159
-
160
- **How to use:**
161
- 1. Upload a CSV file (optional, but recommended for best results)
162
- 2. Ask questions in natural language
163
- 3. I'll automatically select and execute the appropriate tools
164
-
165
- **Example requests:**
166
- • "Analyze this dataset and tell me about it"
167
- • "Train a model to predict the 'price' column"
168
- • "Deploy a classification model for the 'category' column"
169
- • "Run the complete ML pipeline on my data"
170
-
171
- """
172
- system_info += tools_info
173
-
174
- return system_info
175
-
176
- except Exception as e:
177
- logger.error(f"Error getting system info: {str(e)}")
178
- return "Error retrieving system information"
179
-
180
- def create_interface(self) -> gr.Interface:
181
- """Create the Gradio interface"""
182
-
183
- # Custom CSS for better appearance
184
- css = """
185
- .gradio-container {
186
- max-width: 1200px !important;
187
- margin: auto !important;
188
- }
189
- .message.user {
190
- background-color: #e3f2fd !important;
191
- }
192
- .message.assistant {
193
- background-color: #f5f5f5 !important;
194
- }
195
- """
196
-
197
- with gr.Blocks(
198
- title="Chatbase ML Assistant",
199
- theme=gr.themes.Soft(),
200
- css=css
201
- ) as interface:
202
-
203
- gr.Markdown("# 🤖 Chatbase ML Assistant")
204
- gr.Markdown("Upload your CSV dataset and chat with AI to perform ML tasks automatically!")
205
-
206
- with gr.Row():
207
- with gr.Column(scale=2):
208
- # File upload section
209
- file_upload = gr.File(
210
- label="📁 Upload CSV File",
211
- file_types=[".csv"],
212
- type="binary"
213
- )
214
-
215
- file_status = gr.Textbox(
216
- label="File Status",
217
- interactive=False,
218
- placeholder="No file uploaded"
219
- )
220
-
221
- file_analysis = gr.Markdown(
222
- label="📊 File Analysis",
223
- value="Upload a CSV file to see analysis..."
224
- )
225
-
226
- with gr.Column(scale=3):
227
- # Chat interface
228
- chatbot = gr.Chatbot(
229
- label="💬 Chat",
230
- height=500,
231
- bubble_full_width=False
232
- )
233
-
234
- msg = gr.Textbox(
235
- label="Type your message...",
236
- placeholder="Ask me to analyze data, train models, or deploy ML pipelines...",
237
- lines=2
238
  )
239
-
240
- with gr.Row():
241
- submit_btn = gr.Button("💬 Send", variant="primary")
242
- clear_btn = gr.Button("🗑️ Clear Chat")
243
-
244
- # System info display
245
- with gr.Accordion("ℹ️ Available Tools & Help", open=False):
246
- gr.Markdown(self.get_system_info())
247
-
248
- # Event handlers
249
- file_upload.upload(
250
- fn=self.handle_file_upload,
251
- inputs=[file_upload],
252
- outputs=[file_status, file_analysis]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
  )
254
 
255
- msg.submit(
256
- fn=self.handle_message,
257
- inputs=[msg, chatbot],
258
- outputs=[chatbot]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
259
  )
260
-
261
- submit_btn.click(
262
- fn=self.handle_message,
263
- inputs=[msg, chatbot],
264
- outputs=[chatbot]
265
  )
266
-
267
- clear_btn.click(
268
- fn=self.clear_conversation,
269
- outputs=[chatbot, file_status, file_analysis]
 
 
 
270
  )
271
 
272
- # Add examples
273
- gr.Examples(
274
- examples=[
275
- ["Analyze this dataset and tell me about the main insights"],
276
- ["Train a classification model to predict the target column"],
277
- ["What are the key statistics and patterns in this data?"],
278
- ["Train and deploy a machine learning model for this dataset"],
279
- ["What type of ML task would be best suited for this data?"]
280
- ],
281
- inputs=[msg],
282
- label="💡 Example Messages"
283
  )
284
-
285
- return interface
286
-
287
- async def run(self):
288
- """Run the application"""
289
-
290
- try:
291
- # Initialize components
292
- await self.initialize_app()
293
-
294
- # Create and launch interface
295
- interface = self.create_interface()
296
-
297
- logger.info("Starting Gradio interface...")
298
- interface.launch(
299
- server_name="0.0.0.0",
300
- server_port=7860,
301
- share=True,
302
- show_error=True
303
- )
304
-
305
- except Exception as e:
306
- logger.error(f"Error running application: {str(e)}")
307
- raise
308
-
309
- def main():
310
- """Main entry point"""
311
-
312
- app = ChatbaseGradioApp()
313
-
314
- # Run the app
315
- try:
316
- asyncio.run(app.run())
317
- except KeyboardInterrupt:
318
- logger.info("Application stopped by user")
319
- except Exception as e:
320
- logger.error(f"Fatal error: {str(e)}")
321
- raise
 
 
322
 
323
  if __name__ == "__main__":
324
- main()
 
1
  """
2
+ Gradio MCP Client for Remote MCP Server - With File Upload
 
3
  """
4
+
5
  import os
6
+ import json
 
 
7
  import gradio as gr
8
+ from contextlib import asynccontextmanager
9
+
10
+ from openai import OpenAI
11
+ from fastmcp import Client
12
+ from fastmcp.client.transports import StreamableHttpTransport
13
+
14
+ # Configuration
15
+ MCP_SERVER_URL = "https://mcp-1st-birthday-auto-deployer.hf.space/gradio_api/mcp/"
16
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
17
+ MODEL = "gpt-5-mini"
18
+
19
+
20
+ class MCPClientManager:
21
+ def __init__(self, server_url: str):
22
+ self.server_url = server_url
23
+
24
+ @asynccontextmanager
25
+ async def get_client(self):
26
+ transport = StreamableHttpTransport(self.server_url)
27
+ async with Client(transport) as client:
28
+ yield client
29
+
30
+ async def get_tools(self) -> list:
31
+ async with self.get_client() as client:
32
+ return await client.list_tools()
33
+
34
+ async def call_tool(self, tool_name: str, arguments: dict) -> str:
35
+ async with self.get_client() as client:
36
+ result = await client.call_tool(tool_name, arguments)
37
+ if hasattr(result, "content"):
38
+ if isinstance(result.content, list):
39
+ return "\n".join(
40
+ str(item.text) if hasattr(item, "text") else str(item)
41
+ for item in result.content
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  )
43
+ return str(result.content)
44
+ return str(result)
45
+
46
+ def to_openai_tools(self, tools: list) -> list:
47
+ return [
48
+ {
49
+ "type": "function",
50
+ "function": {
51
+ "name": tool.name,
52
+ "description": tool.description or "",
53
+ "parameters": {
54
+ "type": "object",
55
+ "properties": tool.inputSchema.get("properties", {})
56
+ if tool.inputSchema
57
+ else {},
58
+ "required": tool.inputSchema.get("required", [])
59
+ if tool.inputSchema
60
+ else [],
61
+ },
62
+ },
63
+ }
64
+ for tool in tools
65
+ ]
66
+
67
+
68
+ mcp = MCPClientManager(MCP_SERVER_URL)
69
+ openai_client = OpenAI(api_key=OPENAI_API_KEY)
70
+
71
+ SYSTEM_PROMPT = """You are a helpful ML assistant with access to Auto Deployer tools.
72
+
73
+ IMPORTANT: When calling tools with file_path parameter:
74
+ - If the user uploaded a file, use the provided file URL directly
75
+ - Pass ONLY the raw URL (e.g., "https://...")
76
+ - Never add prefixes like "Gradio File Input - "
77
+
78
+ Always pass URLs directly without any prefix."""
79
+
80
+
81
+ async def chat(message: str, history: list, file_url: str):
82
+ """Process chat with optional file URL"""
83
+ tools = await mcp.get_tools()
84
+ openai_tools = mcp.to_openai_tools(tools)
85
+
86
+ messages = [{"role": "system", "content": SYSTEM_PROMPT}]
87
+
88
+ # Add file context if available
89
+ user_content = message
90
+ if file_url:
91
+ user_content = f"[Uploaded CSV file URL: {file_url}]\n\n{message}"
92
+
93
+ # Build history - handle both old tuple format and new dict format
94
+ for item in history:
95
+ if isinstance(item, dict):
96
+ messages.append({"role": item["role"], "content": item["content"]})
97
+ elif isinstance(item, (list, tuple)) and len(item) == 2:
98
+ user_msg, assistant_msg = item
99
+ messages.append({"role": "user", "content": user_msg})
100
+ if assistant_msg:
101
+ messages.append({"role": "assistant", "content": assistant_msg})
102
+
103
+ messages.append({"role": "user", "content": user_content})
104
+
105
+ # First call
106
+ response = openai_client.chat.completions.create(
107
+ model=MODEL,
108
+ messages=messages,
109
+ tools=openai_tools,
110
+ tool_choice="auto",
111
+ )
112
+
113
+ assistant_message = response.choices[0].message
114
+
115
+ # Handle tool calls
116
+ while assistant_message.tool_calls:
117
+ messages.append(assistant_message)
118
+
119
+ yield "🔧 Calling tools...\n\n"
120
+
121
+ for tool_call in assistant_message.tool_calls:
122
+ tool_name = tool_call.function.name
123
+ arguments = json.loads(tool_call.function.arguments)
124
+
125
+ # Clean file_path
126
+ if "file_path" in arguments:
127
+ fp = arguments["file_path"]
128
+ if fp.startswith("Gradio File Input - "):
129
+ arguments["file_path"] = fp.replace("Gradio File Input - ", "")
130
+
131
+ yield f"⚙️ Running `{tool_name}`...\n\n"
132
+
133
+ try:
134
+ tool_result = await mcp.call_tool(tool_name, arguments)
135
+ except Exception as e:
136
+ tool_result = f"Error: {e}"
137
+
138
+ messages.append(
139
+ {
140
+ "role": "tool",
141
+ "tool_call_id": tool_call.id,
142
+ "content": tool_result,
143
+ }
144
  )
145
 
146
+ response = openai_client.chat.completions.create(
147
+ model=MODEL,
148
+ messages=messages,
149
+ tools=openai_tools,
150
+ tool_choice="auto",
151
+ )
152
+ assistant_message = response.choices[0].message
153
+
154
+ # Stream final response
155
+ stream = openai_client.chat.completions.create(
156
+ model=MODEL,
157
+ messages=messages,
158
+ stream=True,
159
+ )
160
+
161
+ partial_response = ""
162
+ for chunk in stream:
163
+ if chunk.choices[0].delta.content:
164
+ partial_response += chunk.choices[0].delta.content
165
+ yield partial_response
166
+
167
+
168
+ def handle_file_upload(file):
169
+ """Return the file path when uploaded"""
170
+ if file is None:
171
+ return ""
172
+ return file
173
+
174
+
175
+ with gr.Blocks(title="Auto Deployer MCP Client") as demo:
176
+ gr.Markdown(
177
+ """
178
+ # 🤖 Auto Deployer MCP Client
179
+ Upload a CSV file and chat about ML tasks: analyze, train, deploy models.
180
+ """
181
+ )
182
+
183
+ with gr.Row():
184
+ with gr.Column(scale=1):
185
+ file_input = gr.File(
186
+ label="📁 Upload CSV File",
187
+ file_types=[".csv"],
188
+ type="filepath",
189
  )
190
+ file_url = gr.Textbox(
191
+ label="File URL",
192
+ placeholder="Upload a file or paste a URL",
193
+ interactive=True,
 
194
  )
195
+ gr.Markdown(
196
+ """
197
+ ### 💡 Tips
198
+ - Upload a CSV file above
199
+ - Or paste a direct URL to a CSV
200
+ - Then ask questions in the chat
201
+ """
202
  )
203
 
204
+ with gr.Column(scale=3):
205
+ chatbot = gr.Chatbot(label="Chat", height=400)
206
+ msg = gr.Textbox(
207
+ label="Message",
208
+ placeholder="e.g., Analyze this dataset and train a model",
 
 
 
 
 
 
209
  )
210
+ with gr.Row():
211
+ submit = gr.Button("Send", variant="primary")
212
+ clear = gr.Button("Clear")
213
+
214
+ gr.Examples(
215
+ examples=[
216
+ "What tools do you have available?",
217
+ "Analyze this CSV file",
218
+ "Train a classification model with 'species' as the target column",
219
+ "Auto deploy a model predicting 'Survived'",
220
+ ],
221
+ inputs=msg,
222
+ )
223
+
224
+ # Update file URL when file is uploaded
225
+ file_input.change(fn=handle_file_upload, inputs=file_input, outputs=file_url)
226
+
227
+ # Chat handlers
228
+ async def respond(message, history, url):
229
+ history = history or []
230
+ history.append([message, ""])
231
+ response = ""
232
+ async for chunk in chat(message, history[:-1], url):
233
+ response = chunk
234
+ history[-1][1] = response
235
+ yield history
236
+
237
+ submit.click(
238
+ fn=respond,
239
+ inputs=[msg, chatbot, file_url],
240
+ outputs=chatbot,
241
+ ).then(lambda: "", outputs=msg)
242
+
243
+ msg.submit(
244
+ fn=respond,
245
+ inputs=[msg, chatbot, file_url],
246
+ outputs=chatbot,
247
+ ).then(lambda: "", outputs=msg)
248
+
249
+ clear.click(lambda: ([], "", None), outputs=[chatbot, file_url, file_input])
250
 
251
  if __name__ == "__main__":
252
+ demo.launch(allowed_paths=["/tmp"])
config.py DELETED
@@ -1,87 +0,0 @@
1
- """
2
- Configuration settings for Chatbase Gradio MCP Client
3
- """
4
- import os
5
- from typing import Optional
6
-
7
- class Config:
8
- """Configuration class for the application"""
9
-
10
- # MCP Server Settings
11
- MCP_SERVER_URL = os.getenv(
12
- "MCP_SERVER_URL",
13
- "https://mcp-1st-birthday-auto-deployer.hf.space/gradio_api/mcp/"
14
- )
15
-
16
- # OpenAI Settings
17
- OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
18
- OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-5-mini")
19
-
20
- # Gradio Settings
21
- GRADIO_SERVER_NAME = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
22
- GRADIO_SERVER_PORT = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
23
- GRADIO_SHARE = os.getenv("GRADIO_SHARE", "true").lower() == "true"
24
-
25
- # File Settings
26
- UPLOAD_DIR = os.getenv("UPLOAD_DIR", "uploads")
27
- MAX_FILE_SIZE = int(os.getenv("MAX_FILE_SIZE", "100")) * 1024 * 1024 # 100MB
28
- ALLOWED_EXTENSIONS = [".csv"]
29
-
30
- # Logging Settings
31
- LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
32
- LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
33
-
34
- # AI Orchestrator Settings
35
- MAX_CONVERSATION_HISTORY = int(os.getenv("MAX_CONVERSATION_HISTORY", "20"))
36
- DEFAULT_TEMPERATURE = float(os.getenv("DEFAULT_TEMPERATURE", "0.1"))
37
- MAX_TOKENS = int(os.getenv("MAX_TOKENS", "1000"))
38
-
39
- @classmethod
40
- def validate(cls) -> bool:
41
- """Validate that required configuration is present"""
42
-
43
- if not cls.OPENAI_API_KEY:
44
- print("❌ OPENAI_API_KEY environment variable is required")
45
- return False
46
-
47
- if not cls.MCP_SERVER_URL:
48
- print("❌ MCP_SERVER_URL is required")
49
- return False
50
-
51
- print("✅ Configuration is valid")
52
- return True
53
-
54
- @classmethod
55
- def get_info(cls) -> str:
56
- """Get configuration information for display"""
57
-
58
- info = """
59
- **Configuration Status:**
60
- • MCP Server: {mcp_server}
61
- • OpenAI Model: {openai_model}
62
- • Gradio Server: {server_name}:{server_port}
63
- • Upload Directory: {upload_dir}
64
- • Max File Size: {max_file_size}MB
65
- • Log Level: {log_level}
66
-
67
- **Required Environment Variables:**
68
- • OPENAI_API_KEY: {"✅ Set" if cls.OPENAI_API_KEY else "❌ Missing"}
69
- • MCP_SERVER_URL: {"✅ Set" if cls.MCP_SERVER_URL else "❌ Missing"}
70
-
71
- **Optional Environment Variables:**
72
- • GRADIO_SERVER_NAME: {server_name}
73
- • GRADIO_SERVER_PORT: {server_port}
74
- • GRADIO_SHARE: {share}
75
- • LOG_LEVEL: {log_level}
76
- """.format(
77
- mcp_server=cls.MCP_SERVER_URL,
78
- openai_model=cls.OPENAI_MODEL,
79
- server_name=cls.GRADIO_SERVER_NAME,
80
- server_port=cls.GRADIO_SERVER_PORT,
81
- upload_dir=cls.UPLOAD_DIR,
82
- max_file_size=cls.MAX_FILE_SIZE // (1024 * 1024),
83
- log_level=cls.LOG_LEVEL,
84
- share=cls.GRADIO_SHARE
85
- )
86
-
87
- return info
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
file_handler.py DELETED
@@ -1,296 +0,0 @@
1
- """
2
- File Handler for CSV processing and public URL generation
3
- Supports file upload, validation, and preparation for MCP tools
4
- """
5
- import os
6
- import tempfile
7
- import uuid
8
- import logging
9
- from typing import Optional, Dict, Any, Tuple
10
- import pandas as pd
11
- from pathlib import Path
12
-
13
- # Configure logging
14
- logging.basicConfig(level=logging.INFO)
15
- logger = logging.getLogger(__name__)
16
-
17
- class FileHandler:
18
- """Handles file operations for the ML pipeline"""
19
-
20
- def __init__(self, upload_dir: str = "uploads"):
21
- self.upload_dir = Path(upload_dir)
22
- self.upload_dir.mkdir(exist_ok=True)
23
- self.temp_files = {}
24
- self.file_analyses = {}
25
-
26
- async def save_upload_file(
27
- self,
28
- file_obj,
29
- filename: Optional[str] = None
30
- ) -> Tuple[str, Dict[str, Any]]:
31
- """
32
- Save uploaded file and return file path and analysis
33
-
34
- Args:
35
- file_obj: File-like object (from Gradio upload)
36
- filename: Original filename
37
-
38
- Returns:
39
- Tuple of (file_path, file_analysis)
40
- """
41
- try:
42
- # Generate unique filename
43
- if not filename:
44
- filename = str(uuid.uuid4())
45
-
46
- # Ensure .csv extension for CSV files
47
- if not filename.lower().endswith('.csv'):
48
- filename = f"{filename}.csv"
49
-
50
- file_path = self.upload_dir / filename
51
-
52
- # Save the file
53
- with open(file_path, 'wb') as f:
54
- content = file_obj.read()
55
- f.write(content)
56
-
57
- logger.info(f"File saved: {file_path}")
58
-
59
- # Analyze the file
60
- analysis = await self._analyze_file(file_path)
61
-
62
- # Store file info for cleanup
63
- self.temp_files[str(file_path)] = file_path
64
-
65
- return str(file_path), analysis
66
-
67
- except Exception as e:
68
- logger.error(f"Error saving upload file: {str(e)}")
69
- raise
70
-
71
- async def _analyze_file(self, file_path: Path) -> Dict[str, Any]:
72
- """Analyze uploaded file and return metadata"""
73
-
74
- try:
75
- # Read CSV file
76
- df = pd.read_csv(file_path)
77
-
78
- analysis = {
79
- 'filename': file_path.name,
80
- 'file_size': file_path.stat().st_size,
81
- 'rows': len(df),
82
- 'columns': len(df.columns),
83
- 'column_names': list(df.columns),
84
- 'column_types': {col: str(dtype) for col, dtype in df.dtypes.items()},
85
- 'missing_values': df.isnull().sum().to_dict(),
86
- 'sample_data': df.head().to_dict('records'),
87
- 'numeric_columns': list(df.select_dtypes(include=['number']).columns),
88
- 'categorical_columns': list(df.select_dtypes(include=['object']).columns),
89
- 'datetime_columns': list(df.select_dtypes(include=['datetime']).columns),
90
- 'file_path': str(file_path)
91
- }
92
-
93
- # Additional analysis for better ML insights
94
- analysis['target_column_suggestions'] = self._suggest_target_columns(df)
95
- analysis['task_type_suggestions'] = self._suggest_task_types(df)
96
- analysis['data_quality_issues'] = self._check_data_quality(df)
97
-
98
- # Store analysis for later use
99
- self.file_analyses[str(file_path)] = analysis
100
-
101
- logger.info(f"File analysis completed: {analysis['rows']} rows, {analysis['columns']} columns")
102
- return analysis
103
-
104
- except Exception as e:
105
- logger.error(f"Error analyzing file: {str(e)}")
106
- raise ValueError(f"Failed to analyze file: {str(e)}")
107
-
108
- def _suggest_target_columns(self, df: pd.DataFrame) -> Dict[str, str]:
109
- """Suggest potential target columns for ML tasks"""
110
-
111
- suggestions = {}
112
-
113
- # Numeric columns for regression
114
- numeric_cols = df.select_dtypes(include=['number']).columns
115
- for col in numeric_cols:
116
- if df[col].nunique() > 10: # Likely continuous
117
- suggestions[col] = "regression (continuous target)"
118
- elif df[col].nunique() <= 10: # Likely classification
119
- suggestions[col] = "classification (discrete target)"
120
-
121
- # Categorical columns for classification
122
- categorical_cols = df.select_dtypes(include=['object']).columns
123
- for col in categorical_cols:
124
- unique_count = df[col].nunique()
125
- if 2 <= unique_count <= 20: # Good for classification
126
- suggestions[col] = f"classification ({unique_count} classes)"
127
-
128
- return suggestions
129
-
130
- def _suggest_task_types(self, df: pd.DataFrame) -> Dict[str, Any]:
131
- """Suggest ML task types based on data characteristics"""
132
-
133
- suggestions = {}
134
-
135
- # Check for datetime columns (potential time series)
136
- datetime_cols = df.select_dtypes(include=['datetime']).columns
137
- if len(datetime_cols) > 0:
138
- suggestions['time_series_possible'] = True
139
- suggestions['time_series_columns'] = list(datetime_cols)
140
-
141
- # Data size considerations
142
- if len(df) < 1000:
143
- suggestions['size_category'] = 'small'
144
- suggestions['recommendation'] = 'Good for quick experimentation and prototyping'
145
- elif len(df) < 10000:
146
- suggestions['size_category'] = 'medium'
147
- suggestions['recommendation'] = 'Suitable for most ML tasks'
148
- else:
149
- suggestions['size_category'] = 'large'
150
- suggestions['recommendation'] = 'Consider sampling for initial experiments'
151
-
152
- # Check for imbalance in categorical columns
153
- categorical_cols = df.select_dtypes(include=['object']).columns
154
- for col in categorical_cols:
155
- value_counts = df[col].value_counts()
156
- if len(value_counts) > 1:
157
- ratio = value_counts.iloc[0] / value_counts.iloc[1]
158
- if ratio > 3: # Significant imbalance
159
- suggestions[f'{col}_imbalance'] = True
160
- suggestions[f'{col}_imbalance_ratio'] = ratio
161
-
162
- return suggestions
163
-
164
- def _check_data_quality(self, df: pd.DataFrame) -> Dict[str, Any]:
165
- """Check for common data quality issues"""
166
-
167
- issues = {}
168
-
169
- # Missing values
170
- missing_percentage = (df.isnull().sum() / len(df) * 100).to_dict()
171
- high_missing = [col for col, pct in missing_percentage.items() if pct > 50]
172
- if high_missing:
173
- issues['high_missing_columns'] = high_missing
174
-
175
- # Duplicate rows
176
- duplicate_count = df.duplicated().sum()
177
- if duplicate_count > 0:
178
- issues['duplicate_rows'] = int(duplicate_count)
179
- issues['duplicate_percentage'] = round(duplicate_count / len(df) * 100, 2)
180
-
181
- # Constant columns
182
- constant_cols = [col for col in df.columns if df[col].nunique() == 1]
183
- if constant_cols:
184
- issues['constant_columns'] = constant_cols
185
-
186
- # Potential ID columns
187
- id_candidates = []
188
- for col in df.columns:
189
- if (df[col].dtype == 'object' and
190
- df[col].nunique() == len(df) and
191
- not df[col].str.contains(' ', regex=False).any()):
192
- id_candidates.append(col)
193
- if id_candidates:
194
- issues['potential_id_columns'] = id_candidates
195
-
196
- return issues
197
-
198
- def format_file_summary(self, analysis: Dict[str, Any]) -> str:
199
- """Format file analysis into human-readable summary"""
200
-
201
- summary_parts = [
202
- f"📄 **File Analysis: {analysis['filename']}**\n",
203
- f"📊 **Dataset Info:**\n",
204
- f"• Rows: {analysis['rows']:,}\n",
205
- f"• Columns: {analysis['columns']}\n",
206
- f"• File Size: {analysis['file_size']:,} bytes\n\n",
207
- f"📋 **Columns:**\n"
208
- ]
209
-
210
- # Add column information
211
- for col in analysis['column_names'][:10]: # Limit to first 10 columns
212
- col_type = analysis['column_types'][col]
213
- missing = analysis['missing_values'].get(col, 0)
214
- missing_pct = round(missing / analysis['rows'] * 100, 1)
215
- summary_parts.append(f"• {col} ({col_type}) - {missing_pct}% missing\n")
216
-
217
- if len(analysis['column_names']) > 10:
218
- summary_parts.append(f"• ... and {len(analysis['column_names']) - 10} more columns\n")
219
-
220
- # Add ML suggestions
221
- if analysis['target_column_suggestions']:
222
- summary_parts.append(f"\n🎯 **Suggested Target Columns:**\n")
223
- for col, suggestion in list(analysis['target_column_suggestions'].items())[:5]:
224
- summary_parts.append(f"• {col}: {suggestion}\n")
225
-
226
- # Add task type suggestions
227
- if analysis['task_type_suggestions']:
228
- suggestions = analysis['task_type_suggestions']
229
- summary_parts.append(f"\n💡 **ML Task Insights:**\n")
230
-
231
- if 'size_category' in suggestions:
232
- summary_parts.append(f"• Dataset size: {suggestions['size_category']}\n")
233
- summary_parts.append(f"• {suggestions['recommendation']}\n")
234
-
235
- if suggestions.get('time_series_possible'):
236
- summary_parts.append(f"• Time series analysis possible with: {suggestions['time_series_columns']}\n")
237
-
238
- # Add data quality warnings
239
- if analysis['data_quality_issues']:
240
- issues = analysis['data_quality_issues']
241
- summary_parts.append(f"\n⚠️ **Data Quality Notes:**\n")
242
-
243
- if 'duplicate_rows' in issues:
244
- summary_parts.append(f"• {issues['duplicate_rows']:,} duplicate rows ({issues['duplicate_percentage']}%)\n")
245
-
246
- if 'constant_columns' in issues:
247
- summary_parts.append(f"• Constant columns: {issues['constant_columns']}\n")
248
-
249
- if 'high_missing_columns' in issues:
250
- summary_parts.append(f"• High missing values in: {issues['high_missing_columns']}\n")
251
-
252
- return "".join(summary_parts)
253
-
254
- def cleanup_file(self, file_path: str):
255
- """Remove uploaded file and associated data"""
256
-
257
- try:
258
- # Remove file
259
- path = Path(file_path)
260
- if path.exists():
261
- path.unlink()
262
- logger.info(f"Removed file: {file_path}")
263
-
264
- # Remove from tracking
265
- self.temp_files.pop(file_path, None)
266
- self.file_analyses.pop(file_path, None)
267
-
268
- except Exception as e:
269
- logger.error(f"Error cleaning up file {file_path}: {str(e)}")
270
-
271
- def cleanup_all_files(self):
272
- """Remove all temporary files"""
273
-
274
- for file_path in list(self.temp_files.keys()):
275
- self.cleanup_file(file_path)
276
-
277
- logger.info("All temporary files cleaned up")
278
-
279
- def get_file_analysis(self, file_path: str) -> Optional[Dict[str, Any]]:
280
- """Get previously computed file analysis"""
281
-
282
- return self.file_analyses.get(file_path)
283
-
284
- def is_valid_csv(self, file_path: str) -> bool:
285
- """Check if file is a valid CSV"""
286
-
287
- try:
288
- df = pd.read_csv(file_path)
289
- return len(df) > 0 and len(df.columns) > 0
290
- except Exception:
291
- return False
292
-
293
- async def __del__(self):
294
- """Cleanup when handler is destroyed"""
295
-
296
- self.cleanup_all_files()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mcp_client.py DELETED
@@ -1,200 +0,0 @@
1
- """
2
- MCP Client for connecting to remote MCP server
3
- Handles tool discovery, execution, and streaming responses
4
- """
5
- import asyncio
6
- import json
7
- import logging
8
- from typing import Dict, Any, Optional, AsyncGenerator
9
- import aiohttp
10
- from fastmcp import Client
11
-
12
- # Configure logging
13
- logging.basicConfig(level=logging.INFO)
14
- logger = logging.getLogger(__name__)
15
-
16
- class MCPClient:
17
- """Client for interacting with remote MCP server"""
18
-
19
- def __init__(self, server_url: str = "https://mcp-1st-birthday-auto-deployer.hf.space/gradio_api/mcp/"):
20
- self.server_url = server_url
21
- self.client: Optional[Client] = None
22
- self.available_tools = {}
23
-
24
- async def connect(self):
25
- """Establish connection to MCP server"""
26
- try:
27
- logger.info(f"Connecting to MCP server at {self.server_url}")
28
- self.client = Client(self.server_url)
29
- await self.client.__aenter__()
30
-
31
- # Discover available tools
32
- await self.discover_tools()
33
- logger.info(f"Connected successfully. Found {len(self.available_tools)} tools.")
34
- return True
35
-
36
- except Exception as e:
37
- logger.error(f"Failed to connect to MCP server: {str(e)}")
38
- return False
39
-
40
- async def discover_tools(self):
41
- """Discover available tools on the MCP server"""
42
- if not self.client:
43
- raise RuntimeError("Client not connected")
44
-
45
- try:
46
- # Get list of available tools
47
- tools_result = await self.client.list_tools()
48
-
49
- for tool in tools_result.tools:
50
- self.available_tools[tool.name] = {
51
- 'name': tool.name,
52
- 'description': tool.description,
53
- 'input_schema': tool.inputSchema
54
- }
55
-
56
- logger.info(f"Discovered tools: {list(self.available_tools.keys())}")
57
-
58
- except Exception as e:
59
- logger.error(f"Failed to discover tools: {str(e)}")
60
- raise
61
-
62
- async def call_tool(
63
- self,
64
- tool_name: str,
65
- arguments: Dict[str, Any],
66
- stream: bool = False
67
- ) -> Dict[str, Any]:
68
- """Call a specific tool with given arguments"""
69
- if not self.client:
70
- raise RuntimeError("Client not connected")
71
-
72
- if tool_name not in self.available_tools:
73
- raise ValueError(f"Tool '{tool_name}' not available. Available tools: {list(self.available_tools.keys())}")
74
-
75
- try:
76
- logger.info(f"Calling tool '{tool_name}' with arguments: {arguments}")
77
-
78
- # Validate arguments against tool schema if available
79
- tool_info = self.available_tools[tool_name]
80
- if 'input_schema' in tool_info:
81
- self._validate_arguments(tool_info['input_schema'], arguments)
82
-
83
- # Call the tool
84
- result = await self.client.call_tool(tool_name, arguments)
85
-
86
- logger.info(f"Tool '{tool_name}' completed successfully")
87
- return result
88
-
89
- except Exception as e:
90
- logger.error(f"Error calling tool '{tool_name}': {str(e)}")
91
- raise
92
-
93
- async def call_tool_stream(
94
- self,
95
- tool_name: str,
96
- arguments: Dict[str, Any]
97
- ) -> AsyncGenerator[str, None]:
98
- """Call a tool with streaming response"""
99
- if not self.client:
100
- raise RuntimeError("Client not connected")
101
-
102
- if tool_name not in self.available_tools:
103
- raise ValueError(f"Tool '{tool_name}' not available")
104
-
105
- try:
106
- logger.info(f"Calling tool '{tool_name}' with streaming")
107
-
108
- # For streaming, we'll simulate progress updates
109
- # In a real implementation, this would depend on the specific tool
110
- yield f"🔄 Starting tool execution: {tool_name}\n"
111
- yield f"📋 Arguments: {json.dumps(arguments, indent=2)}\n"
112
-
113
- # Call the actual tool
114
- result = await self.call_tool(tool_name, arguments)
115
-
116
- yield "✅ Tool execution completed\n"
117
- yield f"📊 Results:\n{json.dumps(result, indent=2)}\n"
118
-
119
- except Exception as e:
120
- logger.error(f"Error in streaming tool call: {str(e)}")
121
- yield f"❌ Error: {str(e)}\n"
122
-
123
- def _validate_arguments(self, schema: Dict[str, Any], arguments: Dict[str, Any]):
124
- """Validate arguments against tool schema"""
125
- if not schema or 'properties' not in schema:
126
- return
127
-
128
- required = schema.get('required', [])
129
- properties = schema['properties']
130
-
131
- # Check required arguments
132
- for req_arg in required:
133
- if req_arg not in arguments:
134
- raise ValueError(f"Required argument '{req_arg}' is missing")
135
-
136
- # Check argument types (basic validation)
137
- for arg_name, arg_value in arguments.items():
138
- if arg_name in properties:
139
- expected_type = properties[arg_name].get('type')
140
- if expected_type == 'string' and not isinstance(arg_value, str):
141
- raise ValueError(f"Argument '{arg_name}' should be a string")
142
- elif expected_type == 'number' and not isinstance(arg_value, (int, float)):
143
- raise ValueError(f"Argument '{arg_name}' should be a number")
144
- elif expected_type == 'boolean' and not isinstance(arg_value, bool):
145
- raise ValueError(f"Argument '{arg_name}' should be a boolean")
146
-
147
- async def get_tool_description(self, tool_name: str) -> str:
148
- """Get detailed description of a specific tool"""
149
- if tool_name not in self.available_tools:
150
- return f"Tool '{tool_name}' not available"
151
-
152
- tool_info = self.available_tools[tool_name]
153
-
154
- description = f"**{tool_name}**\n"
155
- description += f"Description: {tool_info.get('description', 'No description available')}\n"
156
-
157
- if 'input_schema' in tool_info and 'properties' in tool_info['input_schema']:
158
- description += "\n**Parameters:**\n"
159
- properties = tool_info['input_schema']['properties']
160
- required = tool_info['input_schema'].get('required', [])
161
-
162
- for param_name, param_info in properties.items():
163
- param_type = param_info.get('type', 'unknown')
164
- param_desc = param_info.get('description', 'No description')
165
- is_required = param_name in required
166
-
167
- description += f"- {param_name} ({param_type}) {'[Required]' if is_required else '[Optional]'}: {param_desc}\n"
168
-
169
- return description
170
-
171
- def get_available_tools(self) -> Dict[str, str]:
172
- """Get list of available tools with descriptions"""
173
- return {
174
- name: info.get('description', 'No description available')
175
- for name, info in self.available_tools.items()
176
- }
177
-
178
- async def disconnect(self):
179
- """Close connection to MCP server"""
180
- if self.client:
181
- try:
182
- await self.client.__aexit__(None, None, None)
183
- logger.info("Disconnected from MCP server")
184
- except Exception as e:
185
- logger.error(f"Error during disconnect: {str(e)}")
186
- finally:
187
- self.client = None
188
-
189
- # Singleton instance for reuse
190
- _mcp_client = None
191
-
192
- async def get_mcp_client() -> MCPClient:
193
- """Get or create MCP client instance"""
194
- global _mcp_client
195
-
196
- if _mcp_client is None:
197
- _mcp_client = MCPClient()
198
- await _mcp_client.connect()
199
-
200
- return _mcp_client