ianshank commited on
Commit
3eeba36
Β·
verified Β·
1 Parent(s): 15fc08d

πŸš€ Final fix v20250913_220639: Comprehensive solution for dependency and configuration issues

Browse files
Files changed (6) hide show
  1. README.md +29 -9
  2. app.py +97 -26
  3. deploy_timestamp_20250913_220639.txt +1 -0
  4. preinstall.py +134 -0
  5. requirements.txt +2 -2
  6. start.sh +4 -11
README.md CHANGED
@@ -6,6 +6,7 @@ colorTo: purple
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
 
9
  startup_duration_timeout: 600
10
  pinned: false
11
  license: mit
@@ -20,26 +21,37 @@ A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model
20
 
21
  ## πŸš€ Key Features
22
 
23
- - **🧠 Expert Routing**: Automatically routes queries to specialized experts
24
  - **πŸ”§ Environment Adaptive**: Works seamlessly on both CPU and GPU environments
25
- - **πŸ›‘οΈ Robust Dependency Management**: Conditional installation of dependencies
26
- - **πŸ“¦ Simple Architecture**: Clean, maintainable codebase
27
- - **⚑ Performance Optimized**: Environment-specific optimizations
28
 
29
  ## πŸ”§ Recent Fixes
30
 
31
- - βœ… **Missing Dependencies**: Added `einops` to requirements
32
  - βœ… **Deprecated Parameters**: Fixed all `torch_dtype` β†’ `dtype` usage
33
- - βœ… **CPU Compatibility**: Environment-specific model configuration
34
  - βœ… **Error Handling**: Comprehensive fallback mechanisms
35
  - βœ… **Security**: Updated to Gradio 4.44.0+ for security fixes
36
 
 
 
 
 
 
 
 
 
 
 
37
  ## 🎯 How It Works
38
 
39
  1. **Environment Detection**: Automatically detects CPU vs GPU environment
40
- 2. **Model Configuration**: Uses optimal settings for each environment
41
- 3. **Response Generation**: Generates contextual responses to user queries
42
- 4. **Graceful Fallbacks**: Works even when model loading fails
 
43
 
44
  ## πŸ“Š Performance
45
 
@@ -48,6 +60,14 @@ A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model
48
  | **CPU** | 3-5 min | 8-12 GB | 2-5 |
49
  | **GPU** | 2-3 min | 16-20 GB | 15-30 |
50
 
 
 
 
 
 
 
 
 
51
  ---
52
 
53
  **Built with ❀️ for reliable, production-ready AI applications**
 
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
9
+ entrypoint: start.sh
10
  startup_duration_timeout: 600
11
  pinned: false
12
  license: mit
 
21
 
22
  ## πŸš€ Key Features
23
 
24
+ - **🧠 Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
25
  - **πŸ”§ Environment Adaptive**: Works seamlessly on both CPU and GPU environments
26
+ - **πŸ›‘οΈ Robust Dependency Management**: Conditional installation of dependencies based on environment
27
+ - **πŸ“¦ Fault Tolerance**: Handles missing dependencies with fallback mechanisms
28
+ - **⚑ Performance Optimized**: Environment-specific optimizations for best performance
29
 
30
  ## πŸ”§ Recent Fixes
31
 
32
+ - βœ… **Missing Dependencies**: Added `einops` to requirements, conditional `flash_attn` installation
33
  - βœ… **Deprecated Parameters**: Fixed all `torch_dtype` β†’ `dtype` usage
34
+ - βœ… **CPU Compatibility**: Automatic CPU-safe model revision selection
35
  - βœ… **Error Handling**: Comprehensive fallback mechanisms
36
  - βœ… **Security**: Updated to Gradio 4.44.0+ for security fixes
37
 
38
+ ## πŸ—οΈ Architecture
39
+
40
+ ```
41
+ app.py # Main application entry point
42
+ preinstall.py # Pre-installation script for dependencies
43
+ model_patch.py # Patch for handling missing dependencies
44
+ start.sh # Startup script
45
+ requirements.txt # Core dependencies
46
+ ```
47
+
48
  ## 🎯 How It Works
49
 
50
  1. **Environment Detection**: Automatically detects CPU vs GPU environment
51
+ 2. **Dependency Management**: Installs required dependencies based on environment
52
+ 3. **Model Configuration**: Uses optimal settings for each environment
53
+ 4. **Expert Routing**: Classifies queries and routes to appropriate expert
54
+ 5. **Graceful Fallbacks**: Works even when dependencies are missing
55
 
56
  ## πŸ“Š Performance
57
 
 
60
  | **CPU** | 3-5 min | 8-12 GB | 2-5 |
61
  | **GPU** | 2-3 min | 16-20 GB | 15-30 |
62
 
63
+ ## πŸ” Troubleshooting
64
+
65
+ If you encounter issues:
66
+ 1. Check the logs for dependency installation
67
+ 2. Verify the pre-installation script executed successfully
68
+ 3. Ensure all required packages are installed
69
+ 4. Try the fallback mode if model loading fails
70
+
71
  ---
72
 
73
  **Built with ❀️ for reliable, production-ready AI applications**
app.py CHANGED
@@ -1,7 +1,21 @@
 
 
 
 
 
 
 
 
1
  import gradio as gr
2
  import torch
3
  from transformers import AutoTokenizer, AutoModelForCausalLM
4
- import os
 
 
 
 
 
 
5
 
6
  # Environment detection
7
  ON_GPU = torch.cuda.is_available()
@@ -10,20 +24,36 @@ REVISION = os.getenv("HF_REVISION")
10
 
11
  # Configuration based on environment
12
  if ON_GPU:
13
- attn_impl = "sdpa"
14
- dtype = torch.bfloat16
15
- device_map = "auto"
 
16
  else:
17
- attn_impl = "eager"
18
- dtype = torch.float32
19
- device_map = "cpu"
 
20
 
21
  print(f"πŸš€ Loading model: {MODEL_ID}")
22
  print(f"πŸ”§ Environment: {'GPU' if ON_GPU else 'CPU'}")
23
- print(f"πŸ“Š Attention: {attn_impl}, Dtype: {dtype}")
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  try:
26
  # Load tokenizer
 
27
  tokenizer = AutoTokenizer.from_pretrained(
28
  MODEL_ID,
29
  trust_remote_code=True,
@@ -31,6 +61,7 @@ try:
31
  )
32
 
33
  # Load model with environment-specific settings
 
34
  model = AutoModelForCausalLM.from_pretrained(
35
  MODEL_ID,
36
  trust_remote_code=True,
@@ -38,24 +69,54 @@ try:
38
  attn_implementation=attn_impl,
39
  dtype=dtype, # Fixed: Use dtype instead of torch_dtype
40
  device_map=device_map,
41
- low_cpu_mem_usage=not ON_GPU
42
  ).eval()
43
 
44
  print("βœ… Model loaded successfully!")
45
 
 
 
 
 
 
 
 
46
  except Exception as e:
47
- print(f"❌ Model loading failed: {e}")
48
- model = None
49
- tokenizer = None
50
 
51
- def generate_response(prompt, max_tokens=512, temperature=0.7):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  """Generate response from the model."""
53
  if model is None or tokenizer is None:
54
- return "❌ Model not loaded. Please check the logs for errors."
55
 
56
  try:
 
 
 
 
 
 
 
 
57
  # Tokenize input
58
- inputs = tokenizer(prompt, return_tensors="pt")
59
  if ON_GPU:
60
  inputs = {k: v.to(model.device) for k, v in inputs.items()}
61
 
@@ -72,12 +133,12 @@ def generate_response(prompt, max_tokens=512, temperature=0.7):
72
  # Decode response
73
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
74
  # Remove the input prompt from the response
75
- response = response[len(prompt):].strip()
76
 
77
  return response
78
 
79
  except Exception as e:
80
- return f"❌ Generation failed: {str(e)}"
81
 
82
  def create_interface():
83
  """Create the Gradio interface."""
@@ -86,6 +147,9 @@ def create_interface():
86
  gr.Markdown("# πŸ€– Phi-3.5-MoE Expert Assistant")
87
  gr.Markdown(f"**Environment:** {'GPU' if ON_GPU else 'CPU'} | **Model:** {MODEL_ID}")
88
 
 
 
 
89
  with gr.Row():
90
  with gr.Column(scale=3):
91
  prompt = gr.Textbox(
@@ -103,6 +167,12 @@ def create_interface():
103
  minimum=0.1, maximum=2.0, value=0.7, step=0.1,
104
  label="Temperature"
105
  )
 
 
 
 
 
 
106
 
107
  generate_btn = gr.Button("Generate Response", variant="primary")
108
 
@@ -116,25 +186,26 @@ def create_interface():
116
  # Example prompts
117
  gr.Examples(
118
  examples=[
119
- "Explain quantum computing in simple terms",
120
- "Write a Python function to calculate fibonacci numbers",
121
- "What are the benefits of renewable energy?",
122
- "How does machine learning work?",
123
- "Translate 'Hello, how are you?' to Spanish"
 
124
  ],
125
- inputs=prompt
126
  )
127
 
128
  # Event handlers
129
  generate_btn.click(
130
  fn=generate_response,
131
- inputs=[prompt, max_tokens, temperature],
132
  outputs=response
133
  )
134
 
135
  prompt.submit(
136
  fn=generate_response,
137
- inputs=[prompt, max_tokens, temperature],
138
  outputs=response
139
  )
140
 
@@ -146,4 +217,4 @@ if __name__ == "__main__":
146
  server_name="0.0.0.0",
147
  server_port=7860,
148
  share=False
149
- )
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Phi-3.5-MoE Expert Assistant
4
+ Robust application with CPU/GPU environment detection and dependency handling
5
+ """
6
+
7
+ import os
8
+ import sys
9
  import gradio as gr
10
  import torch
11
  from transformers import AutoTokenizer, AutoModelForCausalLM
12
+
13
+ # Apply the model patch if available
14
+ try:
15
+ import model_patch
16
+ print("βœ… Applied model patch for handling missing dependencies")
17
+ except ImportError:
18
+ print("ℹ️ Model patch not found, continuing without it")
19
 
20
  # Environment detection
21
  ON_GPU = torch.cuda.is_available()
 
24
 
25
  # Configuration based on environment
26
  if ON_GPU:
27
+ attn_impl = "sdpa" # Fast attention for GPU
28
+ dtype = torch.bfloat16 # Mixed precision for GPU
29
+ device_map = "auto" # Auto device mapping for GPU
30
+ low_cpu_mem = False # Don't need low memory usage on GPU
31
  else:
32
+ attn_impl = "eager" # Standard attention for CPU
33
+ dtype = torch.float32 # Full precision for CPU
34
+ device_map = "cpu" # Force CPU device
35
+ low_cpu_mem = True # Enable low memory usage on CPU
36
 
37
  print(f"πŸš€ Loading model: {MODEL_ID}")
38
  print(f"πŸ”§ Environment: {'GPU' if ON_GPU else 'CPU'}")
39
+ print(f"πŸ“Š Configuration: attn={attn_impl}, dtype={dtype}, device={device_map}, revision={REVISION}")
40
+
41
+ # Expert categories for query classification
42
+ EXPERT_CATEGORIES = {
43
+ "Code": ["programming", "software", "development", "coding", "algorithm", "python", "javascript", "java", "function", "code", "debug", "api", "framework", "library", "class", "method", "variable"],
44
+ "Math": ["mathematics", "calculation", "equation", "formula", "statistics", "derivative", "integral", "algebra", "calculus", "math", "solve", "calculate", "probability", "geometry", "trigonometry"],
45
+ "Reasoning": ["logic", "analysis", "reasoning", "problem-solving", "critical", "explain", "why", "how", "because", "analyze", "evaluate", "compare", "contrast", "deduce", "infer"],
46
+ "Multilingual": ["translation", "language", "multilingual", "localization", "translate", "spanish", "french", "german", "chinese", "japanese", "korean", "arabic", "russian", "portuguese"],
47
+ "General": ["general", "conversation", "assistance", "help", "hello", "hi", "what", "who", "when", "where", "tell", "describe", "explain"]
48
+ }
49
+
50
+ # Load model with robust error handling
51
+ model = None
52
+ tokenizer = None
53
 
54
  try:
55
  # Load tokenizer
56
+ print("πŸ“ Loading tokenizer...")
57
  tokenizer = AutoTokenizer.from_pretrained(
58
  MODEL_ID,
59
  trust_remote_code=True,
 
61
  )
62
 
63
  # Load model with environment-specific settings
64
+ print("🧠 Loading model...")
65
  model = AutoModelForCausalLM.from_pretrained(
66
  MODEL_ID,
67
  trust_remote_code=True,
 
69
  attn_implementation=attn_impl,
70
  dtype=dtype, # Fixed: Use dtype instead of torch_dtype
71
  device_map=device_map,
72
+ low_cpu_mem_usage=low_cpu_mem
73
  ).eval()
74
 
75
  print("βœ… Model loaded successfully!")
76
 
77
+ # Verify model works with a simple generation
78
+ print("πŸ” Running quick model test...")
79
+ test_input = tokenizer("Hello, I am", return_tensors="pt").to(device_map if device_map != "auto" else model.device)
80
+ with torch.no_grad():
81
+ test_output = model.generate(**test_input, max_new_tokens=5)
82
+ print("βœ… Model test successful!")
83
+
84
  except Exception as e:
85
+ print(f"⚠️ Model loading failed: {e}")
86
+ print("⚠️ Continuing with limited functionality")
 
87
 
88
+ def classify_expert(query):
89
+ """Classify query to determine which expert should handle it."""
90
+ query_lower = query.lower()
91
+ scores = {}
92
+
93
+ for expert, keywords in EXPERT_CATEGORIES.items():
94
+ score = sum(1 for keyword in keywords if keyword in query_lower)
95
+ scores[expert] = score
96
+
97
+ # Get expert with highest score, default to General if tied or no matches
98
+ max_score = max(scores.values()) if scores else 0
99
+ if max_score > 0:
100
+ experts = [expert for expert, score in scores.items() if score == max_score]
101
+ return experts[0]
102
+ return "General"
103
+
104
+ def generate_response(prompt, max_tokens=512, temperature=0.7, expert=None):
105
  """Generate response from the model."""
106
  if model is None or tokenizer is None:
107
+ return "⚠️ Model not loaded. Please check the logs for errors."
108
 
109
  try:
110
+ # Determine expert if not provided
111
+ if expert is None:
112
+ expert = classify_expert(prompt)
113
+
114
+ # Create expert-specific prompt
115
+ system_prompt = f"You are an AI assistant specialized in {expert}. "
116
+ full_prompt = f"{system_prompt}\n\nUser: {prompt}\n\nAssistant:"
117
+
118
  # Tokenize input
119
+ inputs = tokenizer(full_prompt, return_tensors="pt")
120
  if ON_GPU:
121
  inputs = {k: v.to(model.device) for k, v in inputs.items()}
122
 
 
133
  # Decode response
134
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
135
  # Remove the input prompt from the response
136
+ response = response[len(full_prompt):].strip()
137
 
138
  return response
139
 
140
  except Exception as e:
141
+ return f"⚠️ Generation failed: {str(e)}"
142
 
143
  def create_interface():
144
  """Create the Gradio interface."""
 
147
  gr.Markdown("# πŸ€– Phi-3.5-MoE Expert Assistant")
148
  gr.Markdown(f"**Environment:** {'GPU' if ON_GPU else 'CPU'} | **Model:** {MODEL_ID}")
149
 
150
+ if model is None:
151
+ gr.Markdown("⚠️ **Model failed to load. Limited functionality available.**")
152
+
153
  with gr.Row():
154
  with gr.Column(scale=3):
155
  prompt = gr.Textbox(
 
167
  minimum=0.1, maximum=2.0, value=0.7, step=0.1,
168
  label="Temperature"
169
  )
170
+ expert = gr.Dropdown(
171
+ choices=list(EXPERT_CATEGORIES.keys()),
172
+ value=None,
173
+ label="Expert (Optional)",
174
+ allow_custom_value=False
175
+ )
176
 
177
  generate_btn = gr.Button("Generate Response", variant="primary")
178
 
 
186
  # Example prompts
187
  gr.Examples(
188
  examples=[
189
+ ["Explain quantum computing in simple terms", None],
190
+ ["Write a Python function to calculate fibonacci numbers", "Code"],
191
+ ["What are the benefits of renewable energy?", "General"],
192
+ ["How does machine learning work?", "Reasoning"],
193
+ ["Translate 'Hello, how are you?' to Spanish", "Multilingual"],
194
+ ["Solve the equation 3x^2 + 5x - 2 = 0", "Math"]
195
  ],
196
+ inputs=[prompt, expert]
197
  )
198
 
199
  # Event handlers
200
  generate_btn.click(
201
  fn=generate_response,
202
+ inputs=[prompt, max_tokens, temperature, expert],
203
  outputs=response
204
  )
205
 
206
  prompt.submit(
207
  fn=generate_response,
208
+ inputs=[prompt, max_tokens, temperature, expert],
209
  outputs=response
210
  )
211
 
 
217
  server_name="0.0.0.0",
218
  server_port=7860,
219
  share=False
220
+ )
deploy_timestamp_20250913_220639.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Final fix deployed at 2025-09-13 22:06:39.021771
preinstall.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Pre-installation script for Phi-3.5-MoE Space
4
+ Installs required dependencies and selects CPU-safe model revision if needed
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import subprocess
10
+ import torch
11
+ import re
12
+ from pathlib import Path
13
+ from huggingface_hub import HfApi
14
+
15
+ def install_dependencies():
16
+ """Install required dependencies based on environment."""
17
+ print("πŸ”§ Installing required dependencies...")
18
+
19
+ # Always install einops
20
+ subprocess.check_call([sys.executable, "-m", "pip", "install", "einops>=0.7.0"])
21
+ print("βœ… Installed einops")
22
+
23
+ # Install flash-attn only if CUDA is available
24
+ if torch.cuda.is_available():
25
+ try:
26
+ subprocess.check_call([sys.executable, "-m", "pip", "install", "flash-attn>=2.6.0", "--no-build-isolation"])
27
+ print("βœ… Installed flash-attn for GPU runtime")
28
+ except subprocess.CalledProcessError:
29
+ print("⚠️ Failed to install flash-attn, continuing without it")
30
+ else:
31
+ print("ℹ️ CPU runtime detected: skipping flash-attn installation")
32
+
33
+ def select_cpu_safe_revision():
34
+ """Select a CPU-safe model revision by checking commit history."""
35
+ if torch.cuda.is_available() or os.getenv("HF_REVISION"):
36
+ return
37
+
38
+ MODEL_ID = os.getenv("HF_MODEL_ID", "microsoft/Phi-3.5-MoE-instruct")
39
+ TARGET_FILE = "modeling_phimoe.py"
40
+ ENV_FILE = ".env"
41
+
42
+ print(f"πŸ” Selecting CPU-safe revision for {MODEL_ID}...")
43
+
44
+ try:
45
+ api = HfApi()
46
+ for commit in api.list_repo_commits(MODEL_ID, repo_type="model"):
47
+ sha = commit.commit_id
48
+ try:
49
+ file_path = api.hf_hub_download(MODEL_ID, TARGET_FILE, revision=sha, repo_type="model")
50
+ with open(file_path, "r", encoding="utf-8") as f:
51
+ code = f.read()
52
+
53
+ # Check if this version doesn't have flash_attn as a top-level import
54
+ if not re.search(r'^\s*import\s+flash_attn|^\s*from\s+flash_attn', code, flags=re.M):
55
+ # Write to .env file
56
+ with open(ENV_FILE, "a", encoding="utf-8") as env_file:
57
+ env_file.write(f"HF_REVISION={sha}\n")
58
+
59
+ # Also set it in the current environment
60
+ os.environ["HF_REVISION"] = sha
61
+
62
+ print(f"βœ… Selected CPU-safe revision: {sha}")
63
+ return
64
+ except Exception:
65
+ continue
66
+
67
+ print("⚠️ No CPU-safe revision found")
68
+ except Exception as e:
69
+ print(f"⚠️ Error selecting CPU-safe revision: {e}")
70
+
71
+ def create_model_patch():
72
+ """Create a patch file to fix the model loading code."""
73
+ PATCH_FILE = "model_patch.py"
74
+
75
+ patch_content = """
76
+ # Monkey patch for transformers.dynamic_module_utils
77
+ import sys
78
+ import importlib
79
+ from importlib.abc import Loader
80
+ from importlib.machinery import ModuleSpec
81
+ from transformers.dynamic_module_utils import check_imports
82
+
83
+ # Create mock modules for missing dependencies
84
+ class MockModule:
85
+ def __init__(self, name):
86
+ self.__name__ = name
87
+ self.__spec__ = ModuleSpec(name, None)
88
+
89
+ def __getattr__(self, key):
90
+ return MockModule(f"{self.__name__}.{key}")
91
+
92
+ # Override check_imports to handle missing dependencies
93
+ original_check_imports = check_imports
94
+ def patched_check_imports(resolved_module_file):
95
+ try:
96
+ return original_check_imports(resolved_module_file)
97
+ except ImportError as e:
98
+ # Extract missing modules
99
+ import re
100
+ missing = re.findall(r'packages that were not found in your environment: ([^.]+)', str(e))
101
+ if missing:
102
+ missing_modules = [m.strip() for m in missing[0].split(',')]
103
+ print(f"⚠️ Missing dependencies: {', '.join(missing_modules)}")
104
+ print("πŸ”§ Creating mock modules to continue loading...")
105
+
106
+ # Create mock modules
107
+ for module_name in missing_modules:
108
+ if module_name not in sys.modules:
109
+ mock_module = MockModule(module_name)
110
+ sys.modules[module_name] = mock_module
111
+ print(f"βœ… Created mock for {module_name}")
112
+
113
+ # Try again
114
+ return original_check_imports(resolved_module_file)
115
+ else:
116
+ raise
117
+
118
+ # Apply the patch
119
+ from transformers import dynamic_module_utils
120
+ dynamic_module_utils.check_imports = patched_check_imports
121
+ print("βœ… Applied transformers patch for handling missing dependencies")
122
+ """
123
+
124
+ with open(PATCH_FILE, "w", encoding="utf-8") as f:
125
+ f.write(patch_content)
126
+
127
+ print(f"βœ… Created model patch file: {PATCH_FILE}")
128
+
129
+ if __name__ == "__main__":
130
+ print("πŸš€ Running pre-installation script...")
131
+ install_dependencies()
132
+ select_cpu_safe_revision()
133
+ create_model_patch()
134
+ print("βœ… Pre-installation complete!")
requirements.txt CHANGED
@@ -2,9 +2,9 @@ gradio>=4.44.0
2
  torch>=2.0.0
3
  transformers>=4.46.0
4
  accelerate>=0.31.0
5
- einops>=0.8.0
6
  sentencepiece>=0.1.99
7
  protobuf>=3.20.0
8
- huggingface-hub>=0.19.0
9
  tokenizers>=0.15.0
10
  safetensors>=0.4.0
 
2
  torch>=2.0.0
3
  transformers>=4.46.0
4
  accelerate>=0.31.0
5
+ einops>=0.7.0
6
  sentencepiece>=0.1.99
7
  protobuf>=3.20.0
8
+ huggingface-hub>=0.23.0
9
  tokenizers>=0.15.0
10
  safetensors>=0.4.0
start.sh CHANGED
@@ -4,17 +4,10 @@ set -euo pipefail
4
  echo "πŸš€ Starting Phi-3.5-MoE Expert Assistant..."
5
  echo "πŸ“… $(date)"
6
 
7
- # Ensure we're in the right directory
8
- cd /home/user
9
-
10
- # Make prestart script executable
11
- chmod +x prestart.sh
12
-
13
- # Run prestart setup
14
- echo "πŸ”§ Running prestart setup..."
15
- ./prestart.sh
16
 
17
  # Start the application
18
  echo "πŸš€ Starting application..."
19
- cd /home/user
20
- python app/app.py
 
4
  echo "πŸš€ Starting Phi-3.5-MoE Expert Assistant..."
5
  echo "πŸ“… $(date)"
6
 
7
+ # Run pre-installation script
8
+ echo "πŸ”§ Running pre-installation script..."
9
+ python preinstall.py
 
 
 
 
 
 
10
 
11
  # Start the application
12
  echo "πŸš€ Starting application..."
13
+ python app.py