Spaces:

ianshank
/

phi35-moe-demo

Sleeping

App Files Files Community

phi35-moe-demo / README.md

ianshank

🚀 Final fix v20250913_220639: Comprehensive solution for dependency and configuration issues

3eeba36 verified 2 months ago

preview code

raw

history blame contribute delete

2.65 kB

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

metadata

title: Phi-3.5-MoE Expert Assistant
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
entrypoint: start.sh
startup_duration_timeout: 600
pinned: false
license: mit
short_description: AI assistant with expert routing and CPU/GPU support
models:
  - microsoft/Phi-3.5-MoE-instruct

🤖 Phi-3.5-MoE Expert Assistant

A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.

🚀 Key Features

🧠 Expert Routing: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
🔧 Environment Adaptive: Works seamlessly on both CPU and GPU environments
🛡️ Robust Dependency Management: Conditional installation of dependencies based on environment
📦 Fault Tolerance: Handles missing dependencies with fallback mechanisms
⚡ Performance Optimized: Environment-specific optimizations for best performance

🔧 Recent Fixes

✅ Missing Dependencies: Added einops to requirements, conditional flash_attn installation
✅ Deprecated Parameters: Fixed all torch_dtype → dtype usage
✅ CPU Compatibility: Automatic CPU-safe model revision selection
✅ Error Handling: Comprehensive fallback mechanisms
✅ Security: Updated to Gradio 4.44.0+ for security fixes

🏗️ Architecture

app.py              # Main application entry point
preinstall.py       # Pre-installation script for dependencies
model_patch.py      # Patch for handling missing dependencies
start.sh            # Startup script
requirements.txt    # Core dependencies

🎯 How It Works

Environment Detection: Automatically detects CPU vs GPU environment
Dependency Management: Installs required dependencies based on environment
Model Configuration: Uses optimal settings for each environment
Expert Routing: Classifies queries and routes to appropriate expert
Graceful Fallbacks: Works even when dependencies are missing

📊 Performance

Environment	Startup	Memory	Tokens/sec
CPU	3-5 min	8-12 GB	2-5
GPU	2-3 min	16-20 GB	15-30

🔍 Troubleshooting

If you encounter issues:

Check the logs for dependency installation
Verify the pre-installation script executed successfully
Ensure all required packages are installed
Try the fallback mode if model loading fails

Built with ❤️ for reliable, production-ready AI applications