Spaces:

MCP-1st-Birthday
/

TraceMind

Running

Mandark-droid commited on 19 days ago

Commit

fae4e5b

1 Parent(s): 97decc0

Initial TraceMind-AI setup with MCP client integration

- MCP client module for connecting to TraceMind-mcp-server
- Data loader for HuggingFace datasets
- Minimal Gradio app with leaderboard and cost estimator
- Clean, professional README for hackathon submission
- Auth and navigation utilities from MockTraceMind
- Project structure ready for screen migration

Files changed (13) hide show

.env.example +19 -0
README.md +225 -4
app.py +215 -0
data_loader.py +255 -0
mcp_client/__init__.py +8 -0
mcp_client/client.py +351 -0
mcp_client/sync_wrapper.py +131 -0
requirements.txt +20 -0
styles/__init__.py +8 -0
styles/tracemind_theme.py +204 -0
utils/__init__.py +1 -0
utils/auth.py +193 -0
utils/navigation.py +158 -0

.env.example ADDED Viewed

	@@ -0,0 +1,19 @@

+# HuggingFace Configuration
+HF_TOKEN=your_huggingface_token_here
+# TraceMind MCP Server Configuration
+# Use the deployed TraceMind-mcp-server endpoint
+MCP_SERVER_URL=https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/
+# After hackathon submission, use:
+# MCP_SERVER_URL=https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/
+# Dataset Configuration
+LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard
+# Example results/traces repos (will be loaded dynamically from leaderboard)
+# RESULTS_REPO=kshitijthakkar/agent-results-gpt4-20251116
+# TRACES_REPO=kshitijthakkar/agent-traces-gpt4-20251116
+# METRICS_REPO=kshitijthakkar/agent-metrics-gpt4-20251116
+# Development Mode (skip authentication for local testing)
+DEV_MODE=true

README.md CHANGED Viewed

@@ -1,10 +1,231 @@
 ---
 title: TraceMind AI
-emoji: 💻
 colorFrom: indigo
-colorTo: indigo
-sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: TraceMind AI
+emoji: 🔍
 colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: "6.0.0"
+app_file: app.py
 pinned: false
+tags:
+  - mcp-in-action-track-enterprise
+  - agent-evaluation
+  - mcp-client
 ---
+# 🔍 TraceMind-AI
+Agent Evaluation Platform with MCP-Powered Intelligence
+## Overview
+TraceMind-AI is a comprehensive platform for evaluating AI agent performance across different models, providers, and configurations. It provides real-time insights, cost analysis, and detailed trace visualization powered by the Model Context Protocol (MCP).
+## Features
+- **📊 Real-time Leaderboard**: Live evaluation data from HuggingFace datasets
+- **🤖 MCP Integration**: AI-powered analysis using remote MCP servers
+- **💰 Cost Estimation**: Calculate evaluation costs for different models and configurations
+- **🔍 Trace Visualization**: Detailed OpenTelemetry trace analysis
+- **📈 Performance Metrics**: GPU utilization, CO2 emissions, token usage tracking
+## MCP Integration
+TraceMind-AI demonstrates enterprise MCP client usage by connecting to [TraceMind-mcp-server](https://huggingface.co/spaces/kshitijthakkar/TraceMind-mcp-server) via the Model Context Protocol.
+**MCP Tools Used:**
+- `analyze_leaderboard` - AI-generated insights about evaluation trends
+- `estimate_cost` - Cost estimation with hardware recommendations
+- `debug_trace` - Interactive trace analysis and debugging
+- `compare_runs` - Side-by-side run comparison
+- `analyze_results` - Test case analysis with optimization recommendations
+## Quick Start
+### Prerequisites
+- Python 3.10+
+- HuggingFace account (for authentication)
+- HuggingFace token (optional, for private datasets)
+### Installation
+1. Clone the repository:
+```bash
+git clone https://github.com/Mandark-droid/TraceMind-AI.git
+cd TraceMind-AI
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Configure environment:
+```bash
+cp .env.example .env
+# Edit .env with your configuration
+```
+4. Run the application:
+```bash
+python app.py
+```
+Visit http://localhost:7860
+## Configuration
+Create a `.env` file with the following variables:
+```env
+# HuggingFace Configuration
+HF_TOKEN=your_token_here
+# MCP Server URL
+MCP_SERVER_URL=https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/
+# Dataset Configuration
+LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard
+# Development Mode (optional)
+DEV_MODE=true
+```
+## Data Sources
+TraceMind-AI loads evaluation data from HuggingFace datasets:
+- **Leaderboard**: Aggregate statistics for all evaluation runs
+- **Results**: Individual test case results
+- **Traces**: OpenTelemetry trace data
+- **Metrics**: GPU metrics and performance data
+## Architecture
+### Project Structure
+```
+TraceMind-AI/
+├── app.py                 # Main Gradio application
+├── data_loader.py         # HuggingFace dataset integration
+├── mcp_client/            # MCP client implementation
+│   ├── client.py          # Async MCP client
+│   └── sync_wrapper.py    # Synchronous wrapper
+├── utils/                 # Utilities
+│   ├── auth.py            # HuggingFace OAuth
+│   └── navigation.py      # Screen navigation
+├── screens/               # UI screens
+├── components/            # Reusable components
+└── styles/                # Custom CSS
+```
+### MCP Client Integration
+TraceMind-AI uses the MCP Python SDK to connect to remote MCP servers:
+```python
+from mcp_client.sync_wrapper import get_sync_mcp_client
+# Initialize MCP client
+mcp_client = get_sync_mcp_client()
+mcp_client.initialize()
+# Call MCP tools
+insights = mcp_client.analyze_leaderboard(
+    metric_focus="overall",
+    time_range="last_week",
+    top_n=5
+)
+```
+## Usage
+### Viewing the Leaderboard
+1. Log in with your HuggingFace account
+2. Navigate to the "Leaderboard" tab
+3. Click "Load Leaderboard" to fetch the latest data
+4. View AI-powered insights generated by the MCP server
+### Estimating Costs
+1. Navigate to the "Cost Estimator" tab
+2. Enter the model name (e.g., `openai/gpt-4`)
+3. Select agent type and number of tests
+4. Click "Estimate Cost" for AI-powered analysis
+### Viewing Trace Details
+1. Select an evaluation run from the leaderboard
+2. Click on a specific test case
+3. View detailed OpenTelemetry trace visualization
+4. Ask questions about the trace using MCP-powered analysis
+## Technology Stack
+- **UI Framework**: Gradio 6.0
+- **MCP Protocol**: MCP Python SDK 1.21.0+
+- **Data**: HuggingFace Datasets API
+- **Authentication**: HuggingFace OAuth
+- **AI**: Google Gemini 2.5 Flash (via MCP server)
+## Development
+### Running Locally
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Set development mode
+export DEV_MODE=true
+# Run the app
+python app.py
+```
+### Running on HuggingFace Spaces
+This application is configured for deployment on HuggingFace Spaces using the Gradio SDK. The `app.py` file serves as the entry point.
+## Documentation
+For detailed implementation documentation, see:
+- [Data Loader API](data_loader.py) - Dataset loading and caching
+- [MCP Client API](mcp_client/client.py) - MCP protocol integration
+- [Authentication](utils/auth.py) - HuggingFace OAuth integration
+## Demo Video
+[Link to demo video showing the application in action]
+## Social Media
+[Link to social media post about this project]
+## License
+MIT License - See LICENSE file for details
+## Contributing
+Contributions are welcome! Please open an issue or submit a pull request.
+## Acknowledgments
+- **MCP Team** - For the Model Context Protocol specification
+- **Gradio Team** - For Gradio 6 with MCP integration
+- **HuggingFace** - For Spaces hosting and dataset infrastructure
+- **Google** - For Gemini API access
+## Links
+- **Live Demo**: https://huggingface.co/spaces/kshitijthakkar/TraceMind-AI
+- **MCP Server**: https://huggingface.co/spaces/kshitijthakkar/TraceMind-mcp-server
+- **GitHub**: https://github.com/Mandark-droid/TraceMind-AI
+- **MCP Specification**: https://modelcontextprotocol.io
+---
+**MCP's 1st Birthday Hackathon Submission**
+*Track: MCP in Action - Enterprise*

app.py ADDED Viewed

	@@ -0,0 +1,215 @@

+"""
+TraceMind-AI - Agent Evaluation Platform
+MCP Client consuming TraceMind-mcp-server for intelligent analysis
+"""
+import os
+import gradio as gr
+from dotenv import load_dotenv
+import pandas as pd
+# Load environment variables
+load_dotenv()
+# Import utilities
+from utils.auth import is_authenticated, get_user_info, create_login_button, create_user_info_display, DEV_MODE
+from utils.navigation import Navigator, Screen
+from data_loader import create_data_loader_from_env
+from styles.tracemind_theme import get_tracemind_css
+from mcp_client.sync_wrapper import get_sync_mcp_client
+# Initialize
+data_loader = create_data_loader_from_env()
+navigator = Navigator()
+mcp_client = get_sync_mcp_client()
+# Global state
+current_selected_run = None
+def load_leaderboard_view(token, profile):
+    """Load and display the leaderboard with MCP-powered insights"""
+    if not is_authenticated(token, profile):
+        return "Please log in to view the leaderboard", ""
+    try:
+        # Load real data from HuggingFace
+        leaderboard_df = data_loader.load_leaderboard()
+        if leaderboard_df.empty:
+            return "No evaluation runs found in the leaderboard", ""
+        # Format dataframe for display
+        display_df = leaderboard_df[[
+            'model', 'agent_type', 'success_rate', 'total_tests',
+            'avg_duration_ms', 'total_cost_usd', 'co2_emissions_g'
+        ]].copy()
+        # Round numeric columns
+        display_df['success_rate'] = display_df['success_rate'].round(1)
+        display_df['avg_duration_ms'] = display_df['avg_duration_ms'].round(0)
+        display_df['total_cost_usd'] = display_df['total_cost_usd'].round(4)
+        display_df['co2_emissions_g'] = display_df['co2_emissions_g'].round(2)
+        # Get MCP-powered insights
+        try:
+            insights = mcp_client.analyze_leaderboard(
+                metric_focus="overall",
+                time_range="all_time",
+                top_n=5,
+                hf_token=os.getenv('HF_TOKEN'),
+                gemini_api_key=os.getenv('GEMINI_API_KEY')
+            )
+        except Exception as e:
+            insights = f"⚠️ MCP analysis unavailable: {str(e)}\n\n(Server may need initialization)"
+        return display_df, insights
+    except Exception as e:
+        return f"Error loading leaderboard: {e}", ""
+def estimate_evaluation_cost(model, agent_type, num_tests):
+    """Estimate cost for a new evaluation using MCP server"""
+    try:
+        cost_estimate = mcp_client.estimate_cost(
+            model=model,
+            agent_type=agent_type,
+            num_tests=int(num_tests),
+            hf_token=os.getenv('HF_TOKEN'),
+            gemini_api_key=os.getenv('GEMINI_API_KEY')
+        )
+        return cost_estimate
+    except Exception as e:
+        return f"❌ Error estimating cost: {str(e)}"
+def build_ui():
+    """Build the Gradio UI"""
+    with gr.Blocks(css=get_tracemind_css(), title="TraceMind-AI") as demo:
+        # Header
+        gr.Markdown("""
+        # 🔍 TraceMind-AI
+        ### Agent Evaluation Platform with MCP-Powered Intelligence
+        **Powered by:**
+        - 📊 Real data from HuggingFace datasets
+        - 🤖 MCP Server for AI-powered insights ([TraceMind-mcp-server](https://huggingface.co/spaces/kshitijthakkar/TraceMind-mcp-server))
+        - 🧠 Google Gemini 2.5 Flash for analysis
+        """)
+        # Authentication
+        with gr.Row():
+            with gr.Column(scale=2):
+                user_display = gr.HTML(create_user_info_display(None))
+            with gr.Column(scale=1):
+                login_btn = create_login_button()
+        # Main content (shown when authenticated)
+        with gr.Column(visible=DEV_MODE) as main_content:
+            with gr.Tabs() as tabs:
+                # Tab 1: Leaderboard
+                with gr.Tab("📊 Leaderboard"):
+                    gr.Markdown("### Agent Evaluation Leaderboard")
+                    gr.Markdown("Real-time data from `kshitijthakkar/smoltrace-leaderboard`")
+                    load_leaderboard_btn = gr.Button("🔄 Load Leaderboard", variant="primary")
+                    with gr.Row():
+                        with gr.Column(scale=2):
+                            leaderboard_table = gr.Dataframe(
+                                headers=["Model", "Agent Type", "Success Rate %", "Total Tests", "Avg Duration (ms)", "Cost ($)", "CO2 (g)"],
+                                label="Evaluation Runs",
+                                interactive=False
+                            )
+                        with gr.Column(scale=1):
+                            leaderboard_insights = gr.Markdown("**MCP Analysis:**\n\nClick 'Load Leaderboard' to see AI-powered insights")
+                # Tab 2: Cost Estimator
+                with gr.Tab("💰 Cost Estimator"):
+                    gr.Markdown("### Estimate Evaluation Costs")
+                    gr.Markdown("Uses MCP server to calculate costs for different models and configurations")
+                    with gr.Row():
+                        model_input = gr.Textbox(
+                            label="Model",
+                            placeholder="openai/gpt-4 or meta-llama/Llama-3.1-8B",
+                            value="openai/gpt-4"
+                        )
+                        agent_type_input = gr.Dropdown(
+                            ["tool", "code", "both"],
+                            label="Agent Type",
+                            value="both"
+                        )
+                        num_tests_input = gr.Number(
+                            label="Number of Tests",
+                            value=100
+                        )
+                    estimate_btn = gr.Button("💵 Estimate Cost", variant="primary")
+                    cost_output = gr.Markdown("**Cost Estimate:**\n\nEnter details and click 'Estimate Cost'")
+                # Tab 3: MCP Server Status
+                with gr.Tab("🔧 MCP Status"):
+                    gr.Markdown("### TraceMind MCP Server Connection")
+                    mcp_url_display = gr.Textbox(
+                        label="MCP Server URL",
+                        value=os.getenv('MCP_SERVER_URL', 'https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/'),
+                        interactive=False
+                    )
+                    test_mcp_btn = gr.Button("🧪 Test MCP Connection", variant="secondary")
+                    mcp_status = gr.Markdown("**Status:** Not tested yet")
+        # Event handlers
+        def handle_login(token, profile):
+            user = get_user_info(token, profile)
+            return create_user_info_display(user), gr.update(visible=True)
+        login_btn.click(
+            fn=handle_login,
+            inputs=[login_btn, login_btn],  # Gradio provides token/profile automatically
+            outputs=[user_display, main_content]
+        )
+        load_leaderboard_btn.click(
+            fn=load_leaderboard_view,
+            inputs=[login_btn, login_btn],
+            outputs=[leaderboard_table, leaderboard_insights]
+        )
+        estimate_btn.click(
+            fn=estimate_evaluation_cost,
+            inputs=[model_input, agent_type_input, num_tests_input],
+            outputs=[cost_output]
+        )
+        def test_mcp_connection():
+            try:
+                mcp_client.initialize()
+                return "✅ **Connected Successfully!**\n\nMCP server is online and ready"
+            except Exception as e:
+                return f"❌ **Connection Failed**\n\nError: {str(e)}"
+        test_mcp_btn.click(
+            fn=test_mcp_connection,
+            outputs=[mcp_status]
+        )
+    return demo
+if __name__ == "__main__":
+    print("🚀 Starting TraceMind-AI...")
+    print(f"📊 Leaderboard: {os.getenv('LEADERBOARD_REPO', 'kshitijthakkar/smoltrace-leaderboard')}")
+    print(f"🤖 MCP Server: {os.getenv('MCP_SERVER_URL', 'https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/')}")
+    print(f"🛠️  Dev Mode: {DEV_MODE}")
+    demo = build_ui()
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False
+    )

data_loader.py ADDED Viewed

	@@ -0,0 +1,255 @@

+"""
+Data Loader for TraceMind-AI
+Loads real data from HuggingFace datasets (not mock data)
+"""
+import os
+from typing import Optional, Dict, Any, List
+import pandas as pd
+from datasets import load_dataset
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+class TraceMindDataLoader:
+    """Loads evaluation data from HuggingFace datasets"""
+    def __init__(
+        self,
+        leaderboard_repo: Optional[str] = None,
+        hf_token: Optional[str] = None
+    ):
+        """
+        Initialize data loader
+        Args:
+            leaderboard_repo: HuggingFace dataset repo for leaderboard
+            hf_token: HuggingFace API token for private datasets
+        """
+        self.leaderboard_repo = leaderboard_repo or os.getenv(
+            'LEADERBOARD_REPO',
+            'kshitijthakkar/smoltrace-leaderboard'
+        )
+        self.hf_token = hf_token or os.getenv('HF_TOKEN')
+        # Cache for loaded datasets
+        self._leaderboard_df: Optional[pd.DataFrame] = None
+        self._results_cache: Dict[str, pd.DataFrame] = {}
+        self._traces_cache: Dict[str, List[Dict]] = {}
+        self._metrics_cache: Dict[str, Dict] = {}
+    def load_leaderboard(self, force_refresh: bool = False) -> pd.DataFrame:
+        """
+        Load leaderboard dataset from HuggingFace
+        Args:
+            force_refresh: Force reload from HF (ignore cache)
+        Returns:
+            DataFrame with leaderboard data
+        """
+        if self._leaderboard_df is not None and not force_refresh:
+            return self._leaderboard_df
+        try:
+            print(f"📊 Loading leaderboard from {self.leaderboard_repo}...")
+            # Load dataset from HuggingFace
+            dataset = load_dataset(
+                self.leaderboard_repo,
+                split='train',
+                token=self.hf_token
+            )
+            # Convert to DataFrame
+            self._leaderboard_df = pd.DataFrame(dataset)
+            print(f"✅ Loaded {len(self._leaderboard_df)} evaluation runs")
+            return self._leaderboard_df
+        except Exception as e:
+            print(f"❌ Error loading leaderboard: {e}")
+            # Return empty DataFrame with expected columns
+            return pd.DataFrame(columns=[
+                'run_id', 'model', 'agent_type', 'provider',
+                'success_rate', 'total_tests', 'successful_tests', 'failed_tests',
+                'avg_steps', 'avg_duration_ms', 'total_duration_ms',
+                'total_tokens', 'avg_tokens_per_test', 'total_cost_usd', 'avg_cost_per_test_usd',
+                'co2_emissions_g', 'gpu_utilization_avg', 'gpu_memory_max_mib',
+                'results_dataset', 'traces_dataset', 'metrics_dataset',
+                'timestamp', 'submitted_by', 'hf_job_id', 'job_type',
+                'dataset_used', 'smoltrace_version'
+            ])
+    def load_results(self, results_repo: str, force_refresh: bool = False) -> pd.DataFrame:
+        """
+        Load results dataset for a specific run
+        Args:
+            results_repo: HuggingFace dataset repo for results (e.g., 'user/agent-results-gpt4')
+            force_refresh: Force reload from HF
+        Returns:
+            DataFrame with test case results
+        """
+        if results_repo in self._results_cache and not force_refresh:
+            return self._results_cache[results_repo]
+        try:
+            print(f"📊 Loading results from {results_repo}...")
+            dataset = load_dataset(
+                results_repo,
+                split='train',
+                token=self.hf_token
+            )
+            df = pd.DataFrame(dataset)
+            self._results_cache[results_repo] = df
+            print(f"✅ Loaded {len(df)} test cases")
+            return df
+        except Exception as e:
+            print(f"❌ Error loading results: {e}")
+            return pd.DataFrame(columns=[
+                'run_id', 'task_id', 'test_index',
+                'prompt', 'expected_tool', 'difficulty', 'category',
+                'success', 'response', 'tool_called', 'tool_correct',
+                'expected_keywords', 'keywords_matched',
+                'execution_time_ms', 'total_tokens', 'prompt_tokens', 'completion_tokens', 'cost_usd',
+                'trace_id', 'start_time', 'end_time', 'start_time_unix_nano', 'end_time_unix_nano',
+                'error', 'error_type'
+            ])
+    def load_traces(self, traces_repo: str, force_refresh: bool = False) -> List[Dict[str, Any]]:
+        """
+        Load traces dataset for a specific run
+        Args:
+            traces_repo: HuggingFace dataset repo for traces
+            force_refresh: Force reload from HF
+        Returns:
+            List of trace dictionaries (OpenTelemetry format)
+        """
+        if traces_repo in self._traces_cache and not force_refresh:
+            return self._traces_cache[traces_repo]
+        try:
+            print(f"🔍 Loading traces from {traces_repo}...")
+            dataset = load_dataset(
+                traces_repo,
+                split='train',
+                token=self.hf_token
+            )
+            # Convert to list of dicts
+            traces = [dict(item) for item in dataset]
+            self._traces_cache[traces_repo] = traces
+            print(f"✅ Loaded {len(traces)} traces")
+            return traces
+        except Exception as e:
+            print(f"❌ Error loading traces: {e}")
+            return []
+    def load_metrics(self, metrics_repo: str, force_refresh: bool = False) -> Dict[str, Any]:
+        """
+        Load GPU metrics dataset for a specific run
+        Args:
+            metrics_repo: HuggingFace dataset repo for metrics
+            force_refresh: Force reload from HF
+        Returns:
+            Metrics data (OpenTelemetry metrics format)
+        """
+        if metrics_repo in self._metrics_cache and not force_refresh:
+            return self._metrics_cache[metrics_repo]
+        try:
+            print(f"📈 Loading metrics from {metrics_repo}...")
+            dataset = load_dataset(
+                metrics_repo,
+                split='train',
+                token=self.hf_token
+            )
+            # Assume metrics dataset has one row with all metrics
+            if len(dataset) > 0:
+                metrics = dict(dataset[0])
+                self._metrics_cache[metrics_repo] = metrics
+                print(f"✅ Loaded metrics data")
+                return metrics
+            else:
+                print(f"⚠️ No metrics data found")
+                return {}
+        except Exception as e:
+            print(f"❌ Error loading metrics: {e}")
+            return {}
+    def get_run_by_id(self, run_id: str) -> Optional[Dict[str, Any]]:
+        """
+        Get a specific run from the leaderboard by run_id
+        Args:
+            run_id: Run ID to fetch
+        Returns:
+            Run data as dict, or None if not found
+        """
+        leaderboard_df = self.load_leaderboard()
+        run_rows = leaderboard_df[leaderboard_df['run_id'] == run_id]
+        if len(run_rows) > 0:
+            return run_rows.iloc[0].to_dict()
+        else:
+            return None
+    def get_trace_by_id(self, traces_repo: str, trace_id: str) -> Optional[Dict[str, Any]]:
+        """
+        Get a specific trace by trace_id
+        Args:
+            traces_repo: HuggingFace dataset repo for traces
+            trace_id: Trace ID to fetch
+        Returns:
+            Trace data as dict, or None if not found
+        """
+        traces = self.load_traces(traces_repo)
+        for trace in traces:
+            if trace.get('trace_id') == trace_id or trace.get('traceId') == trace_id:
+                return trace
+        return None
+    def clear_cache(self):
+        """Clear all cached data"""
+        self._leaderboard_df = None
+        self._results_cache.clear()
+        self._traces_cache.clear()
+        self._metrics_cache.clear()
+        print("🧹 Cache cleared")
+def create_data_loader_from_env() -> TraceMindDataLoader:
+    """
+    Create a data loader using environment variables
+    Returns:
+        TraceMindDataLoader instance
+    """
+    return TraceMindDataLoader(
+        leaderboard_repo=os.getenv('LEADERBOARD_REPO'),
+        hf_token=os.getenv('HF_TOKEN')
+    )

mcp_client/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+"""
+MCP Client for TraceMind-AI
+Connects to the TraceMind-mcp-server to use real MCP tools
+"""
+from .client import MCPClient
+__all__ = ['MCPClient']

mcp_client/client.py ADDED Viewed

	@@ -0,0 +1,351 @@

+"""
+MCP Client for connecting to TraceMind-mcp-server
+Uses MCP protocol over HTTP to call remote MCP tools
+"""
+import os
+import asyncio
+from typing import Optional, Dict, Any, List
+from mcp import ClientSession, StdioServerParameters
+from mcp.client.sse import sse_client
+import aiohttp
+class MCPClient:
+    """Client for interacting with TraceMind MCP Server"""
+    def __init__(self, server_url: Optional[str] = None):
+        """
+        Initialize MCP Client
+        Args:
+            server_url: URL of the TraceMind-mcp-server endpoint
+                       If None, uses MCP_SERVER_URL from environment
+        """
+        self.server_url = server_url or os.getenv(
+            'MCP_SERVER_URL',
+            'https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/'
+        )
+        self.session: Optional[ClientSession] = None
+        self._initialized = False
+    async def initialize(self):
+        """Initialize connection to MCP server"""
+        if self._initialized:
+            return
+        try:
+            # Connect to SSE endpoint
+            async with sse_client(self.server_url) as (read, write):
+                async with ClientSession(read, write) as session:
+                    self.session = session
+                    await session.initialize()
+                    self._initialized = True
+                    # List available tools for verification
+                    tools_result = await session.list_tools()
+                    print(f"✅ Connected to TraceMind MCP Server at {self.server_url}")
+                    print(f"📊 Available tools: {len(tools_result.tools)}")
+                    for tool in tools_result.tools:
+                        print(f"  - {tool.name}: {tool.description}")
+        except Exception as e:
+            print(f"❌ Failed to connect to MCP server: {e}")
+            raise
+    async def analyze_leaderboard(
+        self,
+        leaderboard_repo: str = "kshitijthakkar/smoltrace-leaderboard",
+        metric_focus: str = "overall",
+        time_range: str = "last_week",
+        top_n: int = 5,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """
+        Call the analyze_leaderboard tool on MCP server
+        Args:
+            leaderboard_repo: HuggingFace dataset repo for leaderboard
+            metric_focus: Focus metric (overall, accuracy, cost, latency, co2)
+            time_range: Time range filter (last_week, last_month, all_time)
+            top_n: Number of top models to highlight
+            hf_token: HuggingFace API token (optional if public dataset)
+            gemini_api_key: Google Gemini API key (optional, server may have it)
+        Returns:
+            AI-generated analysis of the leaderboard
+        """
+        if not self._initialized:
+            await self.initialize()
+        try:
+            # Build arguments
+            args = {
+                "leaderboard_repo": leaderboard_repo,
+                "metric_focus": metric_focus,
+                "time_range": time_range,
+                "top_n": top_n
+            }
+            # Add optional tokens if provided
+            if hf_token:
+                args["hf_token"] = hf_token
+            if gemini_api_key:
+                args["gemini_api_key"] = gemini_api_key
+            # Call MCP tool
+            result = await self.session.call_tool("analyze_leaderboard", arguments=args)
+            # Extract text from result
+            if result.content and len(result.content) > 0:
+                return result.content[0].text
+            else:
+                return "No analysis generated"
+        except Exception as e:
+            return f"❌ Error calling analyze_leaderboard: {str(e)}"
+    async def debug_trace(
+        self,
+        trace_data: Dict[str, Any],
+        question: str,
+        metrics_data: Optional[Dict[str, Any]] = None,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """
+        Call the debug_trace tool on MCP server
+        Args:
+            trace_data: OpenTelemetry trace data (dict with spans)
+            question: User question about the trace
+            metrics_data: Optional GPU metrics data
+            hf_token: HuggingFace API token
+            gemini_api_key: Google Gemini API key
+        Returns:
+            AI-generated answer to the trace question
+        """
+        if not self._initialized:
+            await self.initialize()
+        try:
+            args = {
+                "trace_data": trace_data,
+                "question": question
+            }
+            if metrics_data:
+                args["metrics_data"] = metrics_data
+            if hf_token:
+                args["hf_token"] = hf_token
+            if gemini_api_key:
+                args["gemini_api_key"] = gemini_api_key
+            result = await self.session.call_tool("debug_trace", arguments=args)
+            if result.content and len(result.content) > 0:
+                return result.content[0].text
+            else:
+                return "No answer generated"
+        except Exception as e:
+            return f"❌ Error calling debug_trace: {str(e)}"
+    async def estimate_cost(
+        self,
+        model: str,
+        agent_type: str = "both",
+        num_tests: int = 100,
+        hardware: Optional[str] = None,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """
+        Call the estimate_cost tool on MCP server
+        Args:
+            model: Model name (e.g., 'openai/gpt-4', 'meta-llama/Llama-3.1-8B')
+            agent_type: Agent type (tool, code, both)
+            num_tests: Number of tests to run
+            hardware: Hardware type (cpu, gpu_a10, gpu_h200)
+            hf_token: HuggingFace API token
+            gemini_api_key: Google Gemini API key
+        Returns:
+            Cost estimation with breakdown
+        """
+        if not self._initialized:
+            await self.initialize()
+        try:
+            args = {
+                "model": model,
+                "agent_type": agent_type,
+                "num_tests": num_tests
+            }
+            if hardware:
+                args["hardware"] = hardware
+            if hf_token:
+                args["hf_token"] = hf_token
+            if gemini_api_key:
+                args["gemini_api_key"] = gemini_api_key
+            result = await self.session.call_tool("estimate_cost", arguments=args)
+            if result.content and len(result.content) > 0:
+                return result.content[0].text
+            else:
+                return "No estimation generated"
+        except Exception as e:
+            return f"❌ Error calling estimate_cost: {str(e)}"
+    async def compare_runs(
+        self,
+        run_data_list: List[Dict[str, Any]],
+        focus_metrics: Optional[List[str]] = None,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """
+        Call the compare_runs tool on MCP server
+        Args:
+            run_data_list: List of run data dicts from leaderboard
+            focus_metrics: List of metrics to focus on
+            hf_token: HuggingFace API token
+            gemini_api_key: Google Gemini API key
+        Returns:
+            AI-generated comparison analysis
+        """
+        if not self._initialized:
+            await self.initialize()
+        try:
+            args = {
+                "run_data_list": run_data_list
+            }
+            if focus_metrics:
+                args["focus_metrics"] = focus_metrics
+            if hf_token:
+                args["hf_token"] = hf_token
+            if gemini_api_key:
+                args["gemini_api_key"] = gemini_api_key
+            result = await self.session.call_tool("compare_runs", arguments=args)
+            if result.content and len(result.content) > 0:
+                return result.content[0].text
+            else:
+                return "No comparison generated"
+        except Exception as e:
+            return f"❌ Error calling compare_runs: {str(e)}"
+    async def analyze_results(
+        self,
+        results_data: List[Dict[str, Any]],
+        analysis_focus: str = "optimization",
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """
+        Call the analyze_results tool on MCP server
+        Args:
+            results_data: List of test case results
+            analysis_focus: Focus area (optimization, failures, performance, cost)
+            hf_token: HuggingFace API token
+            gemini_api_key: Google Gemini API key
+        Returns:
+            AI-generated results analysis with recommendations
+        """
+        if not self._initialized:
+            await self.initialize()
+        try:
+            args = {
+                "results_data": results_data,
+                "analysis_focus": analysis_focus
+            }
+            if hf_token:
+                args["hf_token"] = hf_token
+            if gemini_api_key:
+                args["gemini_api_key"] = gemini_api_key
+            result = await self.session.call_tool("analyze_results", arguments=args)
+            if result.content and len(result.content) > 0:
+                return result.content[0].text
+            else:
+                return "No analysis generated"
+        except Exception as e:
+            return f"❌ Error calling analyze_results: {str(e)}"
+    async def get_dataset_info(
+        self,
+        dataset_repo: str,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """
+        Call the get_dataset tool on MCP server (resource)
+        Args:
+            dataset_repo: HuggingFace dataset repo
+            hf_token: HuggingFace API token
+            gemini_api_key: Google Gemini API key
+        Returns:
+            Dataset information and structure
+        """
+        if not self._initialized:
+            await self.initialize()
+        try:
+            args = {
+                "dataset_repo": dataset_repo
+            }
+            if hf_token:
+                args["hf_token"] = hf_token
+            if gemini_api_key:
+                args["gemini_api_key"] = gemini_api_key
+            result = await self.session.call_tool("get_dataset", arguments=args)
+            if result.content and len(result.content) > 0:
+                return result.content[0].text
+            else:
+                return "No dataset info generated"
+        except Exception as e:
+            return f"❌ Error calling get_dataset: {str(e)}"
+    async def close(self):
+        """Close the MCP client session"""
+        if self.session:
+            # Note: ClientSession doesn't have an explicit close method
+            # The context manager handles cleanup
+            self.session = None
+            self._initialized = False
+# Singleton instance for use across the app
+_mcp_client_instance: Optional[MCPClient] = None
+def get_mcp_client() -> MCPClient:
+    """Get or create the global MCP client instance"""
+    global _mcp_client_instance
+    if _mcp_client_instance is None:
+        _mcp_client_instance = MCPClient()
+    return _mcp_client_instance

mcp_client/sync_wrapper.py ADDED Viewed

	@@ -0,0 +1,131 @@

+"""
+Synchronous wrapper for MCP Client
+Provides sync interface for Gradio event handlers
+"""
+import asyncio
+from typing import Optional, Dict, Any, List
+from .client import MCPClient
+class SyncMCPClient:
+    """Synchronous wrapper for MCPClient to use in Gradio event handlers"""
+    def __init__(self, server_url: Optional[str] = None):
+        self.client = MCPClient(server_url)
+        self._loop = None
+    def _get_or_create_event_loop(self):
+        """Get or create an event loop for async operations"""
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_closed():
+                loop = asyncio.new_event_loop()
+                asyncio.set_event_loop(loop)
+        except RuntimeError:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+        return loop
+    def _run_async(self, coro):
+        """Run an async coroutine and return the result"""
+        loop = self._get_or_create_event_loop()
+        return loop.run_until_complete(coro)
+    def initialize(self):
+        """Initialize connection to MCP server (sync)"""
+        return self._run_async(self.client.initialize())
+    def analyze_leaderboard(
+        self,
+        leaderboard_repo: str = "kshitijthakkar/smoltrace-leaderboard",
+        metric_focus: str = "overall",
+        time_range: str = "last_week",
+        top_n: int = 5,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """Analyze leaderboard (sync wrapper)"""
+        return self._run_async(
+            self.client.analyze_leaderboard(
+                leaderboard_repo, metric_focus, time_range, top_n, hf_token, gemini_api_key
+            )
+        )
+    def debug_trace(
+        self,
+        trace_data: Dict[str, Any],
+        question: str,
+        metrics_data: Optional[Dict[str, Any]] = None,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """Debug trace (sync wrapper)"""
+        return self._run_async(
+            self.client.debug_trace(trace_data, question, metrics_data, hf_token, gemini_api_key)
+        )
+    def estimate_cost(
+        self,
+        model: str,
+        agent_type: str = "both",
+        num_tests: int = 100,
+        hardware: Optional[str] = None,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """Estimate cost (sync wrapper)"""
+        return self._run_async(
+            self.client.estimate_cost(model, agent_type, num_tests, hardware, hf_token, gemini_api_key)
+        )
+    def compare_runs(
+        self,
+        run_data_list: List[Dict[str, Any]],
+        focus_metrics: Optional[List[str]] = None,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """Compare runs (sync wrapper)"""
+        return self._run_async(
+            self.client.compare_runs(run_data_list, focus_metrics, hf_token, gemini_api_key)
+        )
+    def analyze_results(
+        self,
+        results_data: List[Dict[str, Any]],
+        analysis_focus: str = "optimization",
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """Analyze results (sync wrapper)"""
+        return self._run_async(
+            self.client.analyze_results(results_data, analysis_focus, hf_token, gemini_api_key)
+        )
+    def get_dataset_info(
+        self,
+        dataset_repo: str,
+        hf_token: Optional[str] = None,
+        gemini_api_key: Optional[str] = None
+    ) -> str:
+        """Get dataset info (sync wrapper)"""
+        return self._run_async(
+            self.client.get_dataset_info(dataset_repo, hf_token, gemini_api_key)
+        )
+    def close(self):
+        """Close the MCP client (sync wrapper)"""
+        return self._run_async(self.client.close())
+# Global instance
+_sync_mcp_client: Optional[SyncMCPClient] = None
+def get_sync_mcp_client() -> SyncMCPClient:
+    """Get or create the global synchronous MCP client"""
+    global _sync_mcp_client
+    if _sync_mcp_client is None:
+        _sync_mcp_client = SyncMCPClient()
+    return _sync_mcp_client

requirements.txt ADDED Viewed

	@@ -0,0 +1,20 @@

+# Gradio for UI
+gradio>=6.0.0
+# MCP Client for connecting to TraceMind-mcp-server
+mcp>=1.21.0
+# HuggingFace for dataset loading
+datasets>=2.14.0
+huggingface-hub>=0.20.0
+# Data processing
+pandas>=2.0.0
+numpy>=1.24.0
+# Utilities
+python-dotenv>=1.0.0
+aiohttp>=3.9.0
+# Optional: For enhanced visualizations
+plotly>=5.18.0

styles/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+"""
+Styles package for TraceMind
+Contains CSS themes and styling utilities
+"""
+from .tracemind_theme import get_tracemind_css
+__all__ = ['get_tracemind_css']

styles/tracemind_theme.py ADDED Viewed

	@@ -0,0 +1,204 @@

+"""
+TraceMind CSS Theme
+Central CSS variables and global styling for consistent theming
+"""
+def get_tracemind_css():
+    """
+    Return the complete CSS for TraceMind with CSS variables
+    Features:
+    - Dark theme optimized
+    - CSS variables for easy theming
+    - Responsive design support
+    - Smooth transitions
+    """
+    return """
+    <style>
+    /* Import fonts */
+    @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap');
+    /* TraceMind CSS Variables */
+    :root {
+        /* Primary Brand Colors */
+        --tm-primary: #4F46E5;           /* Indigo 600 - Main brand */
+        --tm-secondary: #06B6D4;         /* Cyan 500 - Accents */
+        /* Semantic Colors */
+        --tm-success: #10B981;           /* Green 500 - High scores, success */
+        --tm-warning: #F59E0B;           /* Amber 500 - Medium scores, warnings */
+        --tm-danger: #EF4444;            /* Red 500 - Low scores, errors */
+        --tm-info: #3B82F6;              /* Blue 500 - Info, API badge */
+        /* Background Colors (Dark Theme) */
+        --tm-bg-dark: #0F172A;           /* Slate 900 - App background */
+        --tm-bg-card: #1E293B;           /* Slate 800 - Card background */
+        --tm-bg-secondary: #334155;      /* Slate 700 - Secondary elements */
+        --tm-bg-hover: rgba(79, 70, 229, 0.15);  /* Hover overlay */
+        --tm-bg-stripe: rgba(30, 41, 59, 0.5);   /* Table row stripe */
+        /* Text Colors */
+        --tm-text-primary: #F1F5F9;      /* Slate 100 - Primary text */
+        --tm-text-secondary: #94A3B8;    /* Slate 400 - Secondary text */
+        --tm-text-muted: #64748B;        /* Slate 500 - Muted text */
+        /* Border Colors */
+        --tm-border-subtle: rgba(148, 163, 184, 0.1);
+        --tm-border-default: rgba(148, 163, 184, 0.2);
+        --tm-border-strong: rgba(148, 163, 184, 0.4);
+        /* Badge Colors */
+        --tm-badge-tool: #8B5CF6;        /* Purple 500 - Tool agent */
+        --tm-badge-code: #F59E0B;        /* Amber 500 - Code agent */
+        --tm-badge-both: #06B6D4;        /* Cyan 500 - Both agent */
+        --tm-badge-api: #3B82F6;         /* Blue 500 - API provider */
+        --tm-badge-gpu: #10B981;         /* Green 500 - GPU provider */
+        /* Gradient Definitions */
+        --tm-gradient-success: linear-gradient(90deg, #10B981, #06B6D4);
+        --tm-gradient-warning: linear-gradient(90deg, #F59E0B, #FBBF24);
+        --tm-gradient-danger: linear-gradient(90deg, #EF4444, #F59E0B);
+        --tm-gradient-gold: linear-gradient(145deg, #ffd700, #ffc400);
+        --tm-gradient-silver: linear-gradient(145deg, #9ca3af, #787C7E);
+        --tm-gradient-bronze: linear-gradient(145deg, #CD7F32, #b36a1d);
+        /* Shadows */
+        --tm-shadow-sm: 0 1px 2px rgba(0, 0, 0, 0.05);
+        --tm-shadow-md: 0 4px 6px rgba(0, 0, 0, 0.1);
+        --tm-shadow-lg: 0 10px 15px rgba(0, 0, 0, 0.1);
+        --tm-shadow-glow: 0 0 20px rgba(79, 70, 229, 0.3);
+    }
+    /* Global Styles */
+    .gradio-container {
+        font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif !important;
+        background: var(--tm-bg-dark) !important;
+        color: var(--tm-text-primary) !important;
+    }
+    /* Headers */
+    h1, h2, h3, h4, h5, h6 {
+        color: var(--tm-text-primary) !important;
+        font-weight: 600 !important;
+        font-family: 'Inter', sans-serif !important;
+    }
+    /* Links */
+    a {
+        color: var(--tm-secondary) !important;
+        text-decoration: none !important;
+        transition: color 0.2s ease;
+    }
+    a:hover {
+        color: var(--tm-primary) !important;
+    }
+    /* Buttons */
+    button {
+        transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1) !important;
+    }
+    button:hover {
+        transform: translateY(-2px) !important;
+        box-shadow: var(--tm-shadow-lg) !important;
+    }
+    /* Smooth transitions for all interactive elements */
+    * {
+        transition: background-color 0.2s ease, color 0.2s ease, border-color 0.2s ease;
+    }
+    /* Custom scrollbar */
+    ::-webkit-scrollbar {
+        width: 8px;
+        height: 8px;
+    }
+    ::-webkit-scrollbar-track {
+        background: var(--tm-bg-secondary);
+        border-radius: 4px;
+    }
+    ::-webkit-scrollbar-thumb {
+        background: var(--tm-secondary);
+        border-radius: 4px;
+    }
+    ::-webkit-scrollbar-thumb:hover {
+        background: var(--tm-primary);
+    }
+    /* Responsive breakpoints */
+    @media (max-width: 768px) {
+        .gradio-container {
+            padding: 8px !important;
+        }
+        h1 {
+            font-size: 1.5rem !important;
+        }
+        h2 {
+            font-size: 1.25rem !important;
+        }
+        h3 {
+            font-size: 1.1rem !important;
+        }
+    }
+    @media (max-width: 480px) {
+        .gradio-container {
+            padding: 4px !important;
+        }
+    }
+    /* Modal Dialog Styles */
+    .modal-dialog {
+        position: fixed !important;
+        top: 50% !important;
+        left: 50% !important;
+        transform: translate(-50%, -50%) !important;
+        z-index: 9999 !important;
+        background: var(--tm-bg-card) !important;
+        border: 2px solid var(--tm-border-strong) !important;
+        border-radius: 12px !important;
+        padding: 24px !important;
+        box-shadow: 0 25px 50px rgba(0, 0, 0, 0.5) !important;
+        max-width: 800px !important;
+        width: 90% !important;
+        max-height: 90vh !important;
+        overflow-y: auto !important;
+    }
+    /* Modal backdrop */
+    .modal-dialog::before {
+        content: '' !important;
+        position: fixed !important;
+        top: 0 !important;
+        left: 0 !important;
+        right: 0 !important;
+        bottom: 0 !important;
+        background: rgba(0, 0, 0, 0.7) !important;
+        z-index: -1 !important;
+    }
+    /* Specific dialog IDs for additional customization */
+    #new-eval-dialog,
+    #export-dialog {
+        animation: modalFadeIn 0.3s ease-out !important;
+    }
+    @keyframes modalFadeIn {
+        from {
+            opacity: 0;
+            transform: translate(-50%, -55%);
+        }
+        to {
+            opacity: 1;
+            transform: translate(-50%, -50%);
+        }
+    }
+    </style>
+    """

utils/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Utils package for MockTraceMind

utils/auth.py ADDED Viewed

	@@ -0,0 +1,193 @@

+"""
+HuggingFace Authentication for MockTraceMind
+Using Gradio's built-in OAuth support (simpler than manual OAuth)
+"""
+import os
+import gradio as gr
+from typing import Optional
+from dataclasses import dataclass
+# Development mode flag - set DISABLE_OAUTH=true to skip OAuth for local dev
+DEV_MODE = os.getenv("DISABLE_OAUTH", "false").lower() in ("true", "1", "yes")
+@dataclass
+class User:
+    """Authenticated user information"""
+    username: str
+    name: str
+    avatar_url: str
+    token: str
+    @classmethod
+    def from_oauth(cls, token: gr.OAuthToken, profile: gr.OAuthProfile) -> "User":
+        """Create User from Gradio OAuth objects"""
+        return cls(
+            username=profile.username,
+            name=profile.name,
+            avatar_url=profile.picture,
+            token=str(token)
+        )
+    @classmethod
+    def create_dev_user(cls) -> "User":
+        """Create a mock user for development mode"""
+        return cls(
+            username=os.getenv("DEV_USERNAME", "dev_user"),
+            name=os.getenv("DEV_NAME", "Development User"),
+            avatar_url="https://huggingface.co/avatars/default-avatar.png",
+            token="dev_token_12345"
+        )
+def is_authenticated(token: gr.OAuthToken | None, profile: gr.OAuthProfile | None) -> bool:
+    """
+    Check if user is authenticated
+    Args:
+        token: OAuth token from Gradio
+        profile: OAuth profile from Gradio
+    Returns:
+        True if both token and profile are valid, or if in dev mode
+    """
+    # In dev mode, always consider authenticated
+    if DEV_MODE:
+        return True
+    return token is not None and profile is not None
+def get_user_info(token: gr.OAuthToken | None, profile: gr.OAuthProfile | None) -> Optional[User]:
+    """
+    Get user information from OAuth objects
+    Args:
+        token: OAuth token from Gradio
+        profile: OAuth profile from Gradio
+    Returns:
+        User object if authenticated, None otherwise
+    """
+    if not is_authenticated(token, profile):
+        return None
+    # In dev mode, return mock user
+    if DEV_MODE:
+        return User.create_dev_user()
+    return User.from_oauth(token, profile)
+def create_login_handler(on_login_success=None, on_login_failure=None):
+    """
+    Create a login handler function for Gradio LoginButton
+    Args:
+        on_login_success: Callback function called when login succeeds
+        on_login_failure: Callback function called when login fails
+    Returns:
+        Handler function compatible with Gradio LoginButton.click()
+    """
+    def handle_login(token: gr.OAuthToken | None, profile: gr.OAuthProfile | None):
+        if is_authenticated(token, profile):
+            user = get_user_info(token, profile)
+            if on_login_success:
+                return on_login_success(user)
+            return user
+        else:
+            if on_login_failure:
+                return on_login_failure()
+            return None
+    return handle_login
+def require_auth(func):
+    """
+    Decorator to require authentication for a function
+    Usage:
+        @require_auth
+        def my_function(user: User, other_args...):
+            # user is guaranteed to be valid User object
+            pass
+    """
+    def wrapper(token: gr.OAuthToken | None, profile: gr.OAuthProfile | None, *args, **kwargs):
+        if not is_authenticated(token, profile):
+            gr.Warning("Please log in to Hugging Face to access this feature!")
+            return None
+        user = get_user_info(token, profile)
+        return func(user, *args, **kwargs)
+    return wrapper
+# UI component helpers
+def create_login_button(visible: bool = True) -> gr.LoginButton:
+    """
+    Create a styled HuggingFace login button
+    Automatically hidden in dev mode
+    """
+    # Hide login button in dev mode
+    if DEV_MODE:
+        visible = False
+    return gr.LoginButton(visible=visible)
+def create_user_info_display(user: Optional[User]) -> str:
+    """
+    Create HTML for user info display
+    Args:
+        user: User object or None
+    Returns:
+        HTML string for display
+    """
+    if user is None:
+        # In dev mode, don't show login prompt
+        if DEV_MODE:
+            return """
+            <div style="text-align: center; padding: 10px; border: 2px solid #ffa500; border-radius: 10px; background-color: #fff4e6;">
+                <strong>🛠️ Development Mode</strong>
+                <p style="margin: 5px 0 0 0; font-size: 0.9em;">OAuth disabled for local testing</p>
+            </div>
+            """
+        return """
+        <div style="text-align: center; padding: 20px; border: 2px dashed #ccc; border-radius: 10px;">
+            <h3>🔒 Login Required</h3>
+            <p>Please log in with your Hugging Face account to access TraceMind</p>
+        </div>
+        """
+    # Add dev mode badge if in dev mode
+    dev_badge = ""
+    if DEV_MODE:
+        dev_badge = '<span style="background: #ffa500; color: white; padding: 2px 8px; border-radius: 4px; font-size: 0.8em; margin-left: 10px;">DEV</span>'
+    return f"""
+    <div style="display: flex; align-items: center; padding: 10px; border: 1px solid #e0e0e0; border-radius: 8px;">
+        <img src="{user.avatar_url}" alt="{user.name}"
+             style="width: 48px; height: 48px; border-radius: 50%; margin-right: 15px;">
+        <div>
+            <strong>{user.name}</strong>{dev_badge}<br>
+            <small style="color: #666;">@{user.username}</small>
+        </div>
+    </div>
+    """
+def create_auth_warning(message: str = "Please login first") -> str:
+    """Create a warning message for unauthenticated users"""
+    return f"""
+    <div style="text-align: center; padding: 20px; border: 2px solid #ff6b6b; border-radius: 10px; background-color: #ffe0e0;">
+        <h3>⚠️ Authentication Required</h3>
+        <p>{message}</p>
+    </div>
+    """

utils/navigation.py ADDED Viewed

	@@ -0,0 +1,158 @@

+"""
+Navigation utilities for MockTraceMind screen flow
+"""
+import gradio as gr
+from enum import Enum
+from typing import Dict, Any, Tuple
+class Screen(Enum):
+    """Available screens in MockTraceMind"""
+    LEADERBOARD = "leaderboard"
+    COMPARE = "compare"
+    RUN_DETAIL = "run_detail"
+    TRACE_DETAIL = "trace_detail"
+class Navigator:
+    """
+    Manages screen navigation and state
+    Screen Flow:
+    - Leaderboard (Screen 1)
+      - Click row → Run Detail (Screen 3)
+      - Select 2+ rows + Compare → Compare View (Screen 2)
+        - Click either run → Run Detail (Screen 3)
+    - Run Detail (Screen 3)
+      - Click test case row → Trace Detail (Screen 4)
+    - Trace Detail (Screen 4)
+      - Back → Run Detail (Screen 3)
+    """
+    def __init__(self):
+        self.current_screen = Screen.LEADERBOARD
+        self.navigation_stack = [Screen.LEADERBOARD]
+        self.screen_context: Dict[str, Any] = {}
+    def navigate_to(
+        self,
+        screen: Screen,
+        context: Dict[str, Any] = None,
+        add_to_stack: bool = True
+    ) -> Tuple[Screen, Dict[str, Any]]:
+        """
+        Navigate to a screen with optional context
+        Args:
+            screen: Target screen
+            context: Data to pass to the screen
+            add_to_stack: Whether to add to navigation stack
+        Returns:
+            Tuple of (screen, context)
+        """
+        self.current_screen = screen
+        if context:
+            self.screen_context.update(context)
+        if add_to_stack:
+            self.navigation_stack.append(screen)
+        return screen, self.screen_context
+    def back(self) -> Tuple[Screen, Dict[str, Any]]:
+        """
+        Navigate back in the navigation stack
+        Returns:
+            Tuple of (previous_screen, context)
+        """
+        if len(self.navigation_stack) > 1:
+            self.navigation_stack.pop()  # Remove current
+            previous = self.navigation_stack[-1]
+            self.current_screen = previous
+            return previous, self.screen_context
+        # Already at root
+        return self.current_screen, self.screen_context
+    def get_current_screen(self) -> Screen:
+        """Get current active screen"""
+        return self.current_screen
+    def get_context(self, key: str, default: Any = None) -> Any:
+        """Get value from screen context"""
+        return self.screen_context.get(key, default)
+    def set_context(self, key: str, value: Any) -> None:
+        """Set value in screen context"""
+        self.screen_context[key] = value
+    def clear_context(self) -> None:
+        """Clear all screen context"""
+        self.screen_context.clear()
+    def reset(self) -> None:
+        """Reset navigation to initial state"""
+        self.current_screen = Screen.LEADERBOARD
+        self.navigation_stack = [Screen.LEADERBOARD]
+        self.screen_context.clear()
+# Gradio visibility update helpers
+def show_screen(screen: Screen) -> Dict[gr.Component, gr.update]:
+    """
+    Generate Gradio updates to show specific screen
+    Returns:
+        Dictionary of component updates for gr.update
+    """
+    return {
+        "leaderboard_container": gr.update(visible=(screen == Screen.LEADERBOARD)),
+        "compare_container": gr.update(visible=(screen == Screen.COMPARE)),
+        "run_detail_container": gr.update(visible=(screen == Screen.RUN_DETAIL)),
+        "trace_detail_container": gr.update(visible=(screen == Screen.TRACE_DETAIL)),
+    }
+def create_back_button(visible: bool = True) -> gr.Button:
+    """Create a consistent back button"""
+    return gr.Button("⬅️ Back", visible=visible, variant="secondary", size="sm")
+def create_breadcrumb(navigation_stack: list) -> str:
+    """
+    Create breadcrumb navigation HTML
+    Args:
+        navigation_stack: List of Screen enums
+    Returns:
+        HTML string for breadcrumb
+    """
+    breadcrumb_names = {
+        Screen.LEADERBOARD: "Leaderboard",
+        Screen.COMPARE: "Compare",
+        Screen.RUN_DETAIL: "Run Detail",
+        Screen.TRACE_DETAIL: "Trace Detail"
+    }
+    breadcrumb_items = []
+    for i, screen in enumerate(navigation_stack):
+        name = breadcrumb_names.get(screen, screen.value)
+        if i < len(navigation_stack) - 1:
+            # Not the last item - make it a link
+            breadcrumb_items.append(f'<span style="color: #666;">{name}</span>')
+        else:
+            # Last item - current screen
+            breadcrumb_items.append(f'<strong>{name}</strong>')
+    breadcrumb_html = " > ".join(breadcrumb_items)
+    return f"""
+    <div style="padding: 10px; background-color: #f5f5f5; border-radius: 5px; margin-bottom: 10px;">
+        {breadcrumb_html}
+    </div>
+    """