Spaces:
Sleeping
Sleeping
π Final fix v20250913_220639: Comprehensive solution for dependency and configuration issues
Browse files- README.md +29 -9
- app.py +97 -26
- deploy_timestamp_20250913_220639.txt +1 -0
- preinstall.py +134 -0
- requirements.txt +2 -2
- start.sh +4 -11
README.md
CHANGED
|
@@ -6,6 +6,7 @@ colorTo: purple
|
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
|
|
|
| 9 |
startup_duration_timeout: 600
|
| 10 |
pinned: false
|
| 11 |
license: mit
|
|
@@ -20,26 +21,37 @@ A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model
|
|
| 20 |
|
| 21 |
## π Key Features
|
| 22 |
|
| 23 |
-
- **π§ Expert Routing**: Automatically routes queries to specialized experts
|
| 24 |
- **π§ Environment Adaptive**: Works seamlessly on both CPU and GPU environments
|
| 25 |
-
- **π‘οΈ Robust Dependency Management**: Conditional installation of dependencies
|
| 26 |
-
- **π¦
|
| 27 |
-
- **β‘ Performance Optimized**: Environment-specific optimizations
|
| 28 |
|
| 29 |
## π§ Recent Fixes
|
| 30 |
|
| 31 |
-
- β
**Missing Dependencies**: Added `einops` to requirements
|
| 32 |
- β
**Deprecated Parameters**: Fixed all `torch_dtype` β `dtype` usage
|
| 33 |
-
- β
**CPU Compatibility**:
|
| 34 |
- β
**Error Handling**: Comprehensive fallback mechanisms
|
| 35 |
- β
**Security**: Updated to Gradio 4.44.0+ for security fixes
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
## π― How It Works
|
| 38 |
|
| 39 |
1. **Environment Detection**: Automatically detects CPU vs GPU environment
|
| 40 |
-
2. **
|
| 41 |
-
3. **
|
| 42 |
-
4. **
|
|
|
|
| 43 |
|
| 44 |
## π Performance
|
| 45 |
|
|
@@ -48,6 +60,14 @@ A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model
|
|
| 48 |
| **CPU** | 3-5 min | 8-12 GB | 2-5 |
|
| 49 |
| **GPU** | 2-3 min | 16-20 GB | 15-30 |
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
---
|
| 52 |
|
| 53 |
**Built with β€οΈ for reliable, production-ready AI applications**
|
|
|
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
| 9 |
+
entrypoint: start.sh
|
| 10 |
startup_duration_timeout: 600
|
| 11 |
pinned: false
|
| 12 |
license: mit
|
|
|
|
| 21 |
|
| 22 |
## π Key Features
|
| 23 |
|
| 24 |
+
- **π§ Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
|
| 25 |
- **π§ Environment Adaptive**: Works seamlessly on both CPU and GPU environments
|
| 26 |
+
- **π‘οΈ Robust Dependency Management**: Conditional installation of dependencies based on environment
|
| 27 |
+
- **π¦ Fault Tolerance**: Handles missing dependencies with fallback mechanisms
|
| 28 |
+
- **β‘ Performance Optimized**: Environment-specific optimizations for best performance
|
| 29 |
|
| 30 |
## π§ Recent Fixes
|
| 31 |
|
| 32 |
+
- β
**Missing Dependencies**: Added `einops` to requirements, conditional `flash_attn` installation
|
| 33 |
- β
**Deprecated Parameters**: Fixed all `torch_dtype` β `dtype` usage
|
| 34 |
+
- β
**CPU Compatibility**: Automatic CPU-safe model revision selection
|
| 35 |
- β
**Error Handling**: Comprehensive fallback mechanisms
|
| 36 |
- β
**Security**: Updated to Gradio 4.44.0+ for security fixes
|
| 37 |
|
| 38 |
+
## ποΈ Architecture
|
| 39 |
+
|
| 40 |
+
```
|
| 41 |
+
app.py # Main application entry point
|
| 42 |
+
preinstall.py # Pre-installation script for dependencies
|
| 43 |
+
model_patch.py # Patch for handling missing dependencies
|
| 44 |
+
start.sh # Startup script
|
| 45 |
+
requirements.txt # Core dependencies
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
## π― How It Works
|
| 49 |
|
| 50 |
1. **Environment Detection**: Automatically detects CPU vs GPU environment
|
| 51 |
+
2. **Dependency Management**: Installs required dependencies based on environment
|
| 52 |
+
3. **Model Configuration**: Uses optimal settings for each environment
|
| 53 |
+
4. **Expert Routing**: Classifies queries and routes to appropriate expert
|
| 54 |
+
5. **Graceful Fallbacks**: Works even when dependencies are missing
|
| 55 |
|
| 56 |
## π Performance
|
| 57 |
|
|
|
|
| 60 |
| **CPU** | 3-5 min | 8-12 GB | 2-5 |
|
| 61 |
| **GPU** | 2-3 min | 16-20 GB | 15-30 |
|
| 62 |
|
| 63 |
+
## π Troubleshooting
|
| 64 |
+
|
| 65 |
+
If you encounter issues:
|
| 66 |
+
1. Check the logs for dependency installation
|
| 67 |
+
2. Verify the pre-installation script executed successfully
|
| 68 |
+
3. Ensure all required packages are installed
|
| 69 |
+
4. Try the fallback mode if model loading fails
|
| 70 |
+
|
| 71 |
---
|
| 72 |
|
| 73 |
**Built with β€οΈ for reliable, production-ready AI applications**
|
app.py
CHANGED
|
@@ -1,7 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import gradio as gr
|
| 2 |
import torch
|
| 3 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
# Environment detection
|
| 7 |
ON_GPU = torch.cuda.is_available()
|
|
@@ -10,20 +24,36 @@ REVISION = os.getenv("HF_REVISION")
|
|
| 10 |
|
| 11 |
# Configuration based on environment
|
| 12 |
if ON_GPU:
|
| 13 |
-
attn_impl = "sdpa"
|
| 14 |
-
dtype = torch.bfloat16
|
| 15 |
-
device_map = "auto"
|
|
|
|
| 16 |
else:
|
| 17 |
-
attn_impl = "eager"
|
| 18 |
-
dtype = torch.float32
|
| 19 |
-
device_map = "cpu"
|
|
|
|
| 20 |
|
| 21 |
print(f"π Loading model: {MODEL_ID}")
|
| 22 |
print(f"π§ Environment: {'GPU' if ON_GPU else 'CPU'}")
|
| 23 |
-
print(f"π
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
try:
|
| 26 |
# Load tokenizer
|
|
|
|
| 27 |
tokenizer = AutoTokenizer.from_pretrained(
|
| 28 |
MODEL_ID,
|
| 29 |
trust_remote_code=True,
|
|
@@ -31,6 +61,7 @@ try:
|
|
| 31 |
)
|
| 32 |
|
| 33 |
# Load model with environment-specific settings
|
|
|
|
| 34 |
model = AutoModelForCausalLM.from_pretrained(
|
| 35 |
MODEL_ID,
|
| 36 |
trust_remote_code=True,
|
|
@@ -38,24 +69,54 @@ try:
|
|
| 38 |
attn_implementation=attn_impl,
|
| 39 |
dtype=dtype, # Fixed: Use dtype instead of torch_dtype
|
| 40 |
device_map=device_map,
|
| 41 |
-
low_cpu_mem_usage=
|
| 42 |
).eval()
|
| 43 |
|
| 44 |
print("β
Model loaded successfully!")
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
except Exception as e:
|
| 47 |
-
print(f"
|
| 48 |
-
|
| 49 |
-
tokenizer = None
|
| 50 |
|
| 51 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
"""Generate response from the model."""
|
| 53 |
if model is None or tokenizer is None:
|
| 54 |
-
return "
|
| 55 |
|
| 56 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
# Tokenize input
|
| 58 |
-
inputs = tokenizer(
|
| 59 |
if ON_GPU:
|
| 60 |
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
| 61 |
|
|
@@ -72,12 +133,12 @@ def generate_response(prompt, max_tokens=512, temperature=0.7):
|
|
| 72 |
# Decode response
|
| 73 |
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 74 |
# Remove the input prompt from the response
|
| 75 |
-
response = response[len(
|
| 76 |
|
| 77 |
return response
|
| 78 |
|
| 79 |
except Exception as e:
|
| 80 |
-
return f"
|
| 81 |
|
| 82 |
def create_interface():
|
| 83 |
"""Create the Gradio interface."""
|
|
@@ -86,6 +147,9 @@ def create_interface():
|
|
| 86 |
gr.Markdown("# π€ Phi-3.5-MoE Expert Assistant")
|
| 87 |
gr.Markdown(f"**Environment:** {'GPU' if ON_GPU else 'CPU'} | **Model:** {MODEL_ID}")
|
| 88 |
|
|
|
|
|
|
|
|
|
|
| 89 |
with gr.Row():
|
| 90 |
with gr.Column(scale=3):
|
| 91 |
prompt = gr.Textbox(
|
|
@@ -103,6 +167,12 @@ def create_interface():
|
|
| 103 |
minimum=0.1, maximum=2.0, value=0.7, step=0.1,
|
| 104 |
label="Temperature"
|
| 105 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
generate_btn = gr.Button("Generate Response", variant="primary")
|
| 108 |
|
|
@@ -116,25 +186,26 @@ def create_interface():
|
|
| 116 |
# Example prompts
|
| 117 |
gr.Examples(
|
| 118 |
examples=[
|
| 119 |
-
"Explain quantum computing in simple terms",
|
| 120 |
-
"Write a Python function to calculate fibonacci numbers",
|
| 121 |
-
"What are the benefits of renewable energy?",
|
| 122 |
-
"How does machine learning work?",
|
| 123 |
-
"Translate 'Hello, how are you?' to Spanish"
|
|
|
|
| 124 |
],
|
| 125 |
-
inputs=prompt
|
| 126 |
)
|
| 127 |
|
| 128 |
# Event handlers
|
| 129 |
generate_btn.click(
|
| 130 |
fn=generate_response,
|
| 131 |
-
inputs=[prompt, max_tokens, temperature],
|
| 132 |
outputs=response
|
| 133 |
)
|
| 134 |
|
| 135 |
prompt.submit(
|
| 136 |
fn=generate_response,
|
| 137 |
-
inputs=[prompt, max_tokens, temperature],
|
| 138 |
outputs=response
|
| 139 |
)
|
| 140 |
|
|
@@ -146,4 +217,4 @@ if __name__ == "__main__":
|
|
| 146 |
server_name="0.0.0.0",
|
| 147 |
server_port=7860,
|
| 148 |
share=False
|
| 149 |
-
)
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Phi-3.5-MoE Expert Assistant
|
| 4 |
+
Robust application with CPU/GPU environment detection and dependency handling
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import sys
|
| 9 |
import gradio as gr
|
| 10 |
import torch
|
| 11 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 12 |
+
|
| 13 |
+
# Apply the model patch if available
|
| 14 |
+
try:
|
| 15 |
+
import model_patch
|
| 16 |
+
print("β
Applied model patch for handling missing dependencies")
|
| 17 |
+
except ImportError:
|
| 18 |
+
print("βΉοΈ Model patch not found, continuing without it")
|
| 19 |
|
| 20 |
# Environment detection
|
| 21 |
ON_GPU = torch.cuda.is_available()
|
|
|
|
| 24 |
|
| 25 |
# Configuration based on environment
|
| 26 |
if ON_GPU:
|
| 27 |
+
attn_impl = "sdpa" # Fast attention for GPU
|
| 28 |
+
dtype = torch.bfloat16 # Mixed precision for GPU
|
| 29 |
+
device_map = "auto" # Auto device mapping for GPU
|
| 30 |
+
low_cpu_mem = False # Don't need low memory usage on GPU
|
| 31 |
else:
|
| 32 |
+
attn_impl = "eager" # Standard attention for CPU
|
| 33 |
+
dtype = torch.float32 # Full precision for CPU
|
| 34 |
+
device_map = "cpu" # Force CPU device
|
| 35 |
+
low_cpu_mem = True # Enable low memory usage on CPU
|
| 36 |
|
| 37 |
print(f"π Loading model: {MODEL_ID}")
|
| 38 |
print(f"π§ Environment: {'GPU' if ON_GPU else 'CPU'}")
|
| 39 |
+
print(f"π Configuration: attn={attn_impl}, dtype={dtype}, device={device_map}, revision={REVISION}")
|
| 40 |
+
|
| 41 |
+
# Expert categories for query classification
|
| 42 |
+
EXPERT_CATEGORIES = {
|
| 43 |
+
"Code": ["programming", "software", "development", "coding", "algorithm", "python", "javascript", "java", "function", "code", "debug", "api", "framework", "library", "class", "method", "variable"],
|
| 44 |
+
"Math": ["mathematics", "calculation", "equation", "formula", "statistics", "derivative", "integral", "algebra", "calculus", "math", "solve", "calculate", "probability", "geometry", "trigonometry"],
|
| 45 |
+
"Reasoning": ["logic", "analysis", "reasoning", "problem-solving", "critical", "explain", "why", "how", "because", "analyze", "evaluate", "compare", "contrast", "deduce", "infer"],
|
| 46 |
+
"Multilingual": ["translation", "language", "multilingual", "localization", "translate", "spanish", "french", "german", "chinese", "japanese", "korean", "arabic", "russian", "portuguese"],
|
| 47 |
+
"General": ["general", "conversation", "assistance", "help", "hello", "hi", "what", "who", "when", "where", "tell", "describe", "explain"]
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
# Load model with robust error handling
|
| 51 |
+
model = None
|
| 52 |
+
tokenizer = None
|
| 53 |
|
| 54 |
try:
|
| 55 |
# Load tokenizer
|
| 56 |
+
print("π Loading tokenizer...")
|
| 57 |
tokenizer = AutoTokenizer.from_pretrained(
|
| 58 |
MODEL_ID,
|
| 59 |
trust_remote_code=True,
|
|
|
|
| 61 |
)
|
| 62 |
|
| 63 |
# Load model with environment-specific settings
|
| 64 |
+
print("π§ Loading model...")
|
| 65 |
model = AutoModelForCausalLM.from_pretrained(
|
| 66 |
MODEL_ID,
|
| 67 |
trust_remote_code=True,
|
|
|
|
| 69 |
attn_implementation=attn_impl,
|
| 70 |
dtype=dtype, # Fixed: Use dtype instead of torch_dtype
|
| 71 |
device_map=device_map,
|
| 72 |
+
low_cpu_mem_usage=low_cpu_mem
|
| 73 |
).eval()
|
| 74 |
|
| 75 |
print("β
Model loaded successfully!")
|
| 76 |
|
| 77 |
+
# Verify model works with a simple generation
|
| 78 |
+
print("π Running quick model test...")
|
| 79 |
+
test_input = tokenizer("Hello, I am", return_tensors="pt").to(device_map if device_map != "auto" else model.device)
|
| 80 |
+
with torch.no_grad():
|
| 81 |
+
test_output = model.generate(**test_input, max_new_tokens=5)
|
| 82 |
+
print("β
Model test successful!")
|
| 83 |
+
|
| 84 |
except Exception as e:
|
| 85 |
+
print(f"β οΈ Model loading failed: {e}")
|
| 86 |
+
print("β οΈ Continuing with limited functionality")
|
|
|
|
| 87 |
|
| 88 |
+
def classify_expert(query):
|
| 89 |
+
"""Classify query to determine which expert should handle it."""
|
| 90 |
+
query_lower = query.lower()
|
| 91 |
+
scores = {}
|
| 92 |
+
|
| 93 |
+
for expert, keywords in EXPERT_CATEGORIES.items():
|
| 94 |
+
score = sum(1 for keyword in keywords if keyword in query_lower)
|
| 95 |
+
scores[expert] = score
|
| 96 |
+
|
| 97 |
+
# Get expert with highest score, default to General if tied or no matches
|
| 98 |
+
max_score = max(scores.values()) if scores else 0
|
| 99 |
+
if max_score > 0:
|
| 100 |
+
experts = [expert for expert, score in scores.items() if score == max_score]
|
| 101 |
+
return experts[0]
|
| 102 |
+
return "General"
|
| 103 |
+
|
| 104 |
+
def generate_response(prompt, max_tokens=512, temperature=0.7, expert=None):
|
| 105 |
"""Generate response from the model."""
|
| 106 |
if model is None or tokenizer is None:
|
| 107 |
+
return "β οΈ Model not loaded. Please check the logs for errors."
|
| 108 |
|
| 109 |
try:
|
| 110 |
+
# Determine expert if not provided
|
| 111 |
+
if expert is None:
|
| 112 |
+
expert = classify_expert(prompt)
|
| 113 |
+
|
| 114 |
+
# Create expert-specific prompt
|
| 115 |
+
system_prompt = f"You are an AI assistant specialized in {expert}. "
|
| 116 |
+
full_prompt = f"{system_prompt}\n\nUser: {prompt}\n\nAssistant:"
|
| 117 |
+
|
| 118 |
# Tokenize input
|
| 119 |
+
inputs = tokenizer(full_prompt, return_tensors="pt")
|
| 120 |
if ON_GPU:
|
| 121 |
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
| 122 |
|
|
|
|
| 133 |
# Decode response
|
| 134 |
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 135 |
# Remove the input prompt from the response
|
| 136 |
+
response = response[len(full_prompt):].strip()
|
| 137 |
|
| 138 |
return response
|
| 139 |
|
| 140 |
except Exception as e:
|
| 141 |
+
return f"β οΈ Generation failed: {str(e)}"
|
| 142 |
|
| 143 |
def create_interface():
|
| 144 |
"""Create the Gradio interface."""
|
|
|
|
| 147 |
gr.Markdown("# π€ Phi-3.5-MoE Expert Assistant")
|
| 148 |
gr.Markdown(f"**Environment:** {'GPU' if ON_GPU else 'CPU'} | **Model:** {MODEL_ID}")
|
| 149 |
|
| 150 |
+
if model is None:
|
| 151 |
+
gr.Markdown("β οΈ **Model failed to load. Limited functionality available.**")
|
| 152 |
+
|
| 153 |
with gr.Row():
|
| 154 |
with gr.Column(scale=3):
|
| 155 |
prompt = gr.Textbox(
|
|
|
|
| 167 |
minimum=0.1, maximum=2.0, value=0.7, step=0.1,
|
| 168 |
label="Temperature"
|
| 169 |
)
|
| 170 |
+
expert = gr.Dropdown(
|
| 171 |
+
choices=list(EXPERT_CATEGORIES.keys()),
|
| 172 |
+
value=None,
|
| 173 |
+
label="Expert (Optional)",
|
| 174 |
+
allow_custom_value=False
|
| 175 |
+
)
|
| 176 |
|
| 177 |
generate_btn = gr.Button("Generate Response", variant="primary")
|
| 178 |
|
|
|
|
| 186 |
# Example prompts
|
| 187 |
gr.Examples(
|
| 188 |
examples=[
|
| 189 |
+
["Explain quantum computing in simple terms", None],
|
| 190 |
+
["Write a Python function to calculate fibonacci numbers", "Code"],
|
| 191 |
+
["What are the benefits of renewable energy?", "General"],
|
| 192 |
+
["How does machine learning work?", "Reasoning"],
|
| 193 |
+
["Translate 'Hello, how are you?' to Spanish", "Multilingual"],
|
| 194 |
+
["Solve the equation 3x^2 + 5x - 2 = 0", "Math"]
|
| 195 |
],
|
| 196 |
+
inputs=[prompt, expert]
|
| 197 |
)
|
| 198 |
|
| 199 |
# Event handlers
|
| 200 |
generate_btn.click(
|
| 201 |
fn=generate_response,
|
| 202 |
+
inputs=[prompt, max_tokens, temperature, expert],
|
| 203 |
outputs=response
|
| 204 |
)
|
| 205 |
|
| 206 |
prompt.submit(
|
| 207 |
fn=generate_response,
|
| 208 |
+
inputs=[prompt, max_tokens, temperature, expert],
|
| 209 |
outputs=response
|
| 210 |
)
|
| 211 |
|
|
|
|
| 217 |
server_name="0.0.0.0",
|
| 218 |
server_port=7860,
|
| 219 |
share=False
|
| 220 |
+
)
|
deploy_timestamp_20250913_220639.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Final fix deployed at 2025-09-13 22:06:39.021771
|
preinstall.py
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Pre-installation script for Phi-3.5-MoE Space
|
| 4 |
+
Installs required dependencies and selects CPU-safe model revision if needed
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import sys
|
| 9 |
+
import subprocess
|
| 10 |
+
import torch
|
| 11 |
+
import re
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
from huggingface_hub import HfApi
|
| 14 |
+
|
| 15 |
+
def install_dependencies():
|
| 16 |
+
"""Install required dependencies based on environment."""
|
| 17 |
+
print("π§ Installing required dependencies...")
|
| 18 |
+
|
| 19 |
+
# Always install einops
|
| 20 |
+
subprocess.check_call([sys.executable, "-m", "pip", "install", "einops>=0.7.0"])
|
| 21 |
+
print("β
Installed einops")
|
| 22 |
+
|
| 23 |
+
# Install flash-attn only if CUDA is available
|
| 24 |
+
if torch.cuda.is_available():
|
| 25 |
+
try:
|
| 26 |
+
subprocess.check_call([sys.executable, "-m", "pip", "install", "flash-attn>=2.6.0", "--no-build-isolation"])
|
| 27 |
+
print("β
Installed flash-attn for GPU runtime")
|
| 28 |
+
except subprocess.CalledProcessError:
|
| 29 |
+
print("β οΈ Failed to install flash-attn, continuing without it")
|
| 30 |
+
else:
|
| 31 |
+
print("βΉοΈ CPU runtime detected: skipping flash-attn installation")
|
| 32 |
+
|
| 33 |
+
def select_cpu_safe_revision():
|
| 34 |
+
"""Select a CPU-safe model revision by checking commit history."""
|
| 35 |
+
if torch.cuda.is_available() or os.getenv("HF_REVISION"):
|
| 36 |
+
return
|
| 37 |
+
|
| 38 |
+
MODEL_ID = os.getenv("HF_MODEL_ID", "microsoft/Phi-3.5-MoE-instruct")
|
| 39 |
+
TARGET_FILE = "modeling_phimoe.py"
|
| 40 |
+
ENV_FILE = ".env"
|
| 41 |
+
|
| 42 |
+
print(f"π Selecting CPU-safe revision for {MODEL_ID}...")
|
| 43 |
+
|
| 44 |
+
try:
|
| 45 |
+
api = HfApi()
|
| 46 |
+
for commit in api.list_repo_commits(MODEL_ID, repo_type="model"):
|
| 47 |
+
sha = commit.commit_id
|
| 48 |
+
try:
|
| 49 |
+
file_path = api.hf_hub_download(MODEL_ID, TARGET_FILE, revision=sha, repo_type="model")
|
| 50 |
+
with open(file_path, "r", encoding="utf-8") as f:
|
| 51 |
+
code = f.read()
|
| 52 |
+
|
| 53 |
+
# Check if this version doesn't have flash_attn as a top-level import
|
| 54 |
+
if not re.search(r'^\s*import\s+flash_attn|^\s*from\s+flash_attn', code, flags=re.M):
|
| 55 |
+
# Write to .env file
|
| 56 |
+
with open(ENV_FILE, "a", encoding="utf-8") as env_file:
|
| 57 |
+
env_file.write(f"HF_REVISION={sha}\n")
|
| 58 |
+
|
| 59 |
+
# Also set it in the current environment
|
| 60 |
+
os.environ["HF_REVISION"] = sha
|
| 61 |
+
|
| 62 |
+
print(f"β
Selected CPU-safe revision: {sha}")
|
| 63 |
+
return
|
| 64 |
+
except Exception:
|
| 65 |
+
continue
|
| 66 |
+
|
| 67 |
+
print("β οΈ No CPU-safe revision found")
|
| 68 |
+
except Exception as e:
|
| 69 |
+
print(f"β οΈ Error selecting CPU-safe revision: {e}")
|
| 70 |
+
|
| 71 |
+
def create_model_patch():
|
| 72 |
+
"""Create a patch file to fix the model loading code."""
|
| 73 |
+
PATCH_FILE = "model_patch.py"
|
| 74 |
+
|
| 75 |
+
patch_content = """
|
| 76 |
+
# Monkey patch for transformers.dynamic_module_utils
|
| 77 |
+
import sys
|
| 78 |
+
import importlib
|
| 79 |
+
from importlib.abc import Loader
|
| 80 |
+
from importlib.machinery import ModuleSpec
|
| 81 |
+
from transformers.dynamic_module_utils import check_imports
|
| 82 |
+
|
| 83 |
+
# Create mock modules for missing dependencies
|
| 84 |
+
class MockModule:
|
| 85 |
+
def __init__(self, name):
|
| 86 |
+
self.__name__ = name
|
| 87 |
+
self.__spec__ = ModuleSpec(name, None)
|
| 88 |
+
|
| 89 |
+
def __getattr__(self, key):
|
| 90 |
+
return MockModule(f"{self.__name__}.{key}")
|
| 91 |
+
|
| 92 |
+
# Override check_imports to handle missing dependencies
|
| 93 |
+
original_check_imports = check_imports
|
| 94 |
+
def patched_check_imports(resolved_module_file):
|
| 95 |
+
try:
|
| 96 |
+
return original_check_imports(resolved_module_file)
|
| 97 |
+
except ImportError as e:
|
| 98 |
+
# Extract missing modules
|
| 99 |
+
import re
|
| 100 |
+
missing = re.findall(r'packages that were not found in your environment: ([^.]+)', str(e))
|
| 101 |
+
if missing:
|
| 102 |
+
missing_modules = [m.strip() for m in missing[0].split(',')]
|
| 103 |
+
print(f"β οΈ Missing dependencies: {', '.join(missing_modules)}")
|
| 104 |
+
print("π§ Creating mock modules to continue loading...")
|
| 105 |
+
|
| 106 |
+
# Create mock modules
|
| 107 |
+
for module_name in missing_modules:
|
| 108 |
+
if module_name not in sys.modules:
|
| 109 |
+
mock_module = MockModule(module_name)
|
| 110 |
+
sys.modules[module_name] = mock_module
|
| 111 |
+
print(f"β
Created mock for {module_name}")
|
| 112 |
+
|
| 113 |
+
# Try again
|
| 114 |
+
return original_check_imports(resolved_module_file)
|
| 115 |
+
else:
|
| 116 |
+
raise
|
| 117 |
+
|
| 118 |
+
# Apply the patch
|
| 119 |
+
from transformers import dynamic_module_utils
|
| 120 |
+
dynamic_module_utils.check_imports = patched_check_imports
|
| 121 |
+
print("β
Applied transformers patch for handling missing dependencies")
|
| 122 |
+
"""
|
| 123 |
+
|
| 124 |
+
with open(PATCH_FILE, "w", encoding="utf-8") as f:
|
| 125 |
+
f.write(patch_content)
|
| 126 |
+
|
| 127 |
+
print(f"β
Created model patch file: {PATCH_FILE}")
|
| 128 |
+
|
| 129 |
+
if __name__ == "__main__":
|
| 130 |
+
print("π Running pre-installation script...")
|
| 131 |
+
install_dependencies()
|
| 132 |
+
select_cpu_safe_revision()
|
| 133 |
+
create_model_patch()
|
| 134 |
+
print("β
Pre-installation complete!")
|
requirements.txt
CHANGED
|
@@ -2,9 +2,9 @@ gradio>=4.44.0
|
|
| 2 |
torch>=2.0.0
|
| 3 |
transformers>=4.46.0
|
| 4 |
accelerate>=0.31.0
|
| 5 |
-
einops>=0.
|
| 6 |
sentencepiece>=0.1.99
|
| 7 |
protobuf>=3.20.0
|
| 8 |
-
huggingface-hub>=0.
|
| 9 |
tokenizers>=0.15.0
|
| 10 |
safetensors>=0.4.0
|
|
|
|
| 2 |
torch>=2.0.0
|
| 3 |
transformers>=4.46.0
|
| 4 |
accelerate>=0.31.0
|
| 5 |
+
einops>=0.7.0
|
| 6 |
sentencepiece>=0.1.99
|
| 7 |
protobuf>=3.20.0
|
| 8 |
+
huggingface-hub>=0.23.0
|
| 9 |
tokenizers>=0.15.0
|
| 10 |
safetensors>=0.4.0
|
start.sh
CHANGED
|
@@ -4,17 +4,10 @@ set -euo pipefail
|
|
| 4 |
echo "π Starting Phi-3.5-MoE Expert Assistant..."
|
| 5 |
echo "π
$(date)"
|
| 6 |
|
| 7 |
-
#
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
# Make prestart script executable
|
| 11 |
-
chmod +x prestart.sh
|
| 12 |
-
|
| 13 |
-
# Run prestart setup
|
| 14 |
-
echo "π§ Running prestart setup..."
|
| 15 |
-
./prestart.sh
|
| 16 |
|
| 17 |
# Start the application
|
| 18 |
echo "π Starting application..."
|
| 19 |
-
|
| 20 |
-
python app/app.py
|
|
|
|
| 4 |
echo "π Starting Phi-3.5-MoE Expert Assistant..."
|
| 5 |
echo "π
$(date)"
|
| 6 |
|
| 7 |
+
# Run pre-installation script
|
| 8 |
+
echo "π§ Running pre-installation script..."
|
| 9 |
+
python preinstall.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
# Start the application
|
| 12 |
echo "π Starting application..."
|
| 13 |
+
python app.py
|
|
|