ming Claude commited on
Commit
45b6536
Β·
1 Parent(s): 75fe59b

Add V4 local server setup with MPS optimization for Android testing

Browse files

- Optimize V4 model for Apple Silicon MPS GPU (4x faster than CPU)
- Fix MPS detection and BFloat16 incompatibility
- Add comprehensive local server management guide
- Add Android integration documentation with connection details
- Add startup script for easy server management

Performance improvements:
- CPU (before): 2+ minutes (timeout)
- MPS (after): 32 seconds for complete summary
- Inference speed: 2.7 tokens/second on M4 MacBook Pro

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

ANDROID_V4_LOCAL_TESTING.md ADDED
@@ -0,0 +1,406 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Android V4 Local Testing Guide
2
+
3
+ ## Quick Start
4
+
5
+ Your V4 API is running on your Mac and accessible to your Android app on the same WiFi network.
6
+
7
+ ### Connection Details
8
+
9
+ - **Base URL**: `http://192.168.88.12:7860`
10
+ - **V4 Endpoint**: `/api/v4/scrape-and-summarize/stream-ndjson` (recommended)
11
+ - **Alternative Endpoint**: `/api/v4/scrape-and-summarize/stream`
12
+ - **Model**: Qwen/Qwen2.5-3B-Instruct (high quality, ~6-7GB RAM)
13
+ - **Network**: Both devices must be on the same WiFi network
14
+
15
+ ---
16
+
17
+ ## Android App Configuration
18
+
19
+ ### Update Your Base URL
20
+
21
+ In your Android app's network configuration, change the base URL to:
22
+
23
+ ```kotlin
24
+ // Development/Local Testing
25
+ const val BASE_URL = "http://192.168.88.12:7860"
26
+
27
+ // Production (HuggingFace Spaces)
28
+ const val BASE_URL_PROD = "https://your-hf-space.hf.space"
29
+ ```
30
+
31
+ ### Network Security Config
32
+
33
+ Add this to `res/xml/network_security_config.xml` to allow HTTP connections to your local server:
34
+
35
+ ```xml
36
+ <?xml version="1.0" encoding="utf-8"?>
37
+ <network-security-config>
38
+ <domain-config cleartextTrafficPermitted="true">
39
+ <domain includeSubdomains="true">192.168.88.12</domain>
40
+ </domain-config>
41
+ </network-security-config>
42
+ ```
43
+
44
+ Update your `AndroidManifest.xml`:
45
+
46
+ ```xml
47
+ <application
48
+ android:networkSecurityConfig="@xml/network_security_config"
49
+ ...>
50
+ ```
51
+
52
+ ---
53
+
54
+ ## API Usage Examples
55
+
56
+ ### Endpoint 1: NDJSON Streaming (Recommended - 43% faster)
57
+
58
+ **URL**: `http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson`
59
+
60
+ **Request Body** (URL mode):
61
+ ```json
62
+ {
63
+ "url": "https://example.com/article",
64
+ "style": "executive",
65
+ "max_tokens": 512
66
+ }
67
+ ```
68
+
69
+ **Request Body** (Text mode):
70
+ ```json
71
+ {
72
+ "text": "Your article text here (minimum 50 characters)...",
73
+ "style": "executive",
74
+ "max_tokens": 512
75
+ }
76
+ ```
77
+
78
+ **Response Format** (NDJSON patches):
79
+ ```
80
+ data: {"op":"replace","path":"/title","value":"Breaking News"}
81
+ data: {"op":"replace","path":"/main_summary","value":"This is the summary..."}
82
+ data: {"op":"add","path":"/key_points/0","value":"First key point"}
83
+ data: {"op":"add","path":"/key_points/1","value":"Second key point"}
84
+ data: {"op":"replace","path":"/category","value":"Technology"}
85
+ data: {"op":"replace","path":"/sentiment","value":"neutral"}
86
+ data: {"op":"replace","path":"/read_time_min","value":3}
87
+ ```
88
+
89
+ **Final JSON Structure**:
90
+ ```json
91
+ {
92
+ "title": "Breaking News",
93
+ "main_summary": "This is the summary...",
94
+ "key_points": [
95
+ "First key point",
96
+ "Second key point",
97
+ "Third key point"
98
+ ],
99
+ "category": "Technology",
100
+ "sentiment": "neutral",
101
+ "read_time_min": 3
102
+ }
103
+ ```
104
+
105
+ ### Endpoint 2: Raw JSON Streaming
106
+
107
+ **URL**: `http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream`
108
+
109
+ **Request/Response**: Same as above, but streams raw JSON tokens instead of NDJSON patches
110
+
111
+ ---
112
+
113
+ ## Summarization Styles
114
+
115
+ Choose the style that best fits your use case:
116
+
117
+ | Style | Description | Use Case |
118
+ |-------|-------------|----------|
119
+ | `executive` | Business-focused with key takeaways (default) | General articles, news |
120
+ | `skimmer` | Quick facts and highlights | Fast reading, headlines |
121
+ | `eli5` | "Explain Like I'm 5" - simple explanations | Complex topics, education |
122
+
123
+ ---
124
+
125
+ ## cURL Testing Commands
126
+
127
+ ### Test with URL (Web Scraping)
128
+
129
+ ```bash
130
+ curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
131
+ -H "Content-Type: application/json" \
132
+ -d '{
133
+ "url": "https://www.bbc.com/news/technology",
134
+ "style": "executive",
135
+ "max_tokens": 512
136
+ }'
137
+ ```
138
+
139
+ ### Test with Direct Text
140
+
141
+ ```bash
142
+ curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
143
+ -H "Content-Type: application/json" \
144
+ -d '{
145
+ "text": "Artificial intelligence is rapidly transforming the technology landscape. Companies are investing billions in AI research and development. Machine learning models are becoming more sophisticated and capable of handling complex tasks. From healthcare to finance, AI applications are revolutionizing industries and creating new opportunities for innovation.",
146
+ "style": "executive",
147
+ "max_tokens": 512
148
+ }'
149
+ ```
150
+
151
+ ### Test from Your Android Device
152
+
153
+ ```bash
154
+ # If you have Termux or similar on Android:
155
+ curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
156
+ -H "Content-Type: application/json" \
157
+ -d '{"text":"Test from Android","style":"executive"}'
158
+ ```
159
+
160
+ ---
161
+
162
+ ## Kotlin/Android Example
163
+
164
+ ### Using OkHttp + SSE
165
+
166
+ ```kotlin
167
+ import okhttp3.*
168
+ import okhttp3.sse.EventSource
169
+ import okhttp3.sse.EventSourceListener
170
+ import okhttp3.sse.EventSources
171
+
172
+ class V4ApiClient {
173
+ private val client = OkHttpClient()
174
+
175
+ fun summarizeUrl(
176
+ url: String,
177
+ style: String = "executive",
178
+ maxTokens: Int = 512,
179
+ onPatch: (String) -> Unit,
180
+ onComplete: () -> Unit,
181
+ onError: (Throwable) -> Unit
182
+ ) {
183
+ val request = Request.Builder()
184
+ .url("http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson")
185
+ .post(
186
+ """
187
+ {
188
+ "url": "$url",
189
+ "style": "$style",
190
+ "max_tokens": $maxTokens
191
+ }
192
+ """.trimIndent().toRequestBody("application/json".toMediaType())
193
+ )
194
+ .build()
195
+
196
+ val eventSourceListener = object : EventSourceListener() {
197
+ override fun onEvent(
198
+ eventSource: EventSource,
199
+ id: String?,
200
+ type: String?,
201
+ data: String
202
+ ) {
203
+ onPatch(data) // NDJSON patch
204
+ }
205
+
206
+ override fun onClosed(eventSource: EventSource) {
207
+ onComplete()
208
+ }
209
+
210
+ override fun onFailure(
211
+ eventSource: EventSource,
212
+ t: Throwable?,
213
+ response: Response?
214
+ ) {
215
+ onError(t ?: Exception("Unknown error"))
216
+ }
217
+ }
218
+
219
+ EventSources.createFactory(client)
220
+ .newEventSource(request, eventSourceListener)
221
+ }
222
+ }
223
+
224
+ // Usage:
225
+ val apiClient = V4ApiClient()
226
+ val summary = mutableMapOf<String, Any>()
227
+
228
+ apiClient.summarizeUrl(
229
+ url = "https://example.com/article",
230
+ style = "executive",
231
+ onPatch = { patch ->
232
+ // Parse NDJSON patch and update summary object
233
+ val jsonPatch = JSONObject(patch)
234
+ val op = jsonPatch.getString("op")
235
+ val path = jsonPatch.getString("path")
236
+ val value = jsonPatch.get("value")
237
+
238
+ // Apply patch to summary map
239
+ applyPatch(summary, op, path, value)
240
+
241
+ // Update UI with partial results
242
+ updateUI(summary)
243
+ },
244
+ onComplete = {
245
+ Log.d("V4", "Summary complete: $summary")
246
+ },
247
+ onError = { error ->
248
+ Log.e("V4", "Error: ${error.message}")
249
+ }
250
+ )
251
+ ```
252
+
253
+ ---
254
+
255
+ ## Performance Expectations
256
+
257
+ ### Qwen/Qwen2.5-3B-Instruct (Current Configuration)
258
+
259
+ - **Memory**: ~6-7GB unified memory on Mac
260
+ - **Inference Time**: 40-60 seconds per request
261
+ - **Quality**: ⭐⭐⭐⭐ (high quality, coherent summaries)
262
+ - **First Token**: ~1-2 seconds (fast UI feedback)
263
+ - **Device**: CPU (MPS not detected in current run)
264
+
265
+ ### Optimization Tips
266
+
267
+ 1. **Use NDJSON endpoint** for 43% faster time-to-first-token
268
+ 2. **Keep max_tokens at 512** for complete summaries
269
+ 3. **Test with WiFi** (Bluetooth/USB tethering may be slower)
270
+ 4. **Monitor battery** on Android during long sessions
271
+
272
+ ---
273
+
274
+ ## Troubleshooting
275
+
276
+ ### Connection Refused
277
+
278
+ **Problem**: `Failed to connect to /192.168.88.12:7860`
279
+
280
+ **Solutions**:
281
+ 1. Check both devices are on same WiFi network
282
+ 2. Verify server is running: `lsof -i :7860`
283
+ 3. Check Mac's firewall settings (System Settings β†’ Network β†’ Firewall)
284
+ 4. Try pinging Mac from Android: `ping 192.168.88.12`
285
+
286
+ ### Empty or Incomplete Summaries
287
+
288
+ **Problem**: Summary JSON is incomplete or empty
289
+
290
+ **Solutions**:
291
+ 1. Increase `max_tokens` to 512 or higher
292
+ 2. Ensure input text is at least 50 characters
293
+ 3. Check server logs: `tail -f server.log`
294
+ 4. Try switching from URL mode to text mode
295
+
296
+ ### Slow Response
297
+
298
+ **Problem**: Takes > 2 minutes to get results
299
+
300
+ **Solutions**:
301
+ 1. V4 with 3B model is computationally intensive (40-60s normal)
302
+ 2. Consider switching to 1.5B model for faster responses (lower quality)
303
+ 3. Update `.env`: `V4_MODEL_ID=Qwen/Qwen2.5-1.5B-Instruct`
304
+ 4. Restart server after model change
305
+
306
+ ### SSRF Protection Blocking URLs
307
+
308
+ **Problem**: "Invalid URL or SSRF protection triggered"
309
+
310
+ **Solutions**:
311
+ 1. Don't use localhost/127.0.0.1 URLs
312
+ 2. Don't use private IP ranges (10.x, 192.168.x, 172.x)
313
+ 3. Use public URLs only
314
+ 4. For testing, use text mode instead of URL mode
315
+
316
+ ---
317
+
318
+ ## Server Management
319
+
320
+ ### Start Server
321
+
322
+ ```bash
323
+ # Option 1: Using conda environment
324
+ conda run -n summarizer python -m uvicorn app.main:app --host 0.0.0.0 --port 7860
325
+
326
+ # Option 2: Using startup script (see below)
327
+ ./start_v4_local.sh
328
+ ```
329
+
330
+ ### Check Server Status
331
+
332
+ ```bash
333
+ # Check if server is running
334
+ lsof -i :7860
335
+
336
+ # View real-time logs
337
+ tail -f server.log
338
+
339
+ # Check health endpoint
340
+ curl http://localhost:7860/health
341
+ ```
342
+
343
+ ### Stop Server
344
+
345
+ ```bash
346
+ # Find and kill the process
347
+ pkill -f "uvicorn app.main:app"
348
+
349
+ # Or kill by PID
350
+ lsof -ti :7860 | xargs kill
351
+ ```
352
+
353
+ ---
354
+
355
+ ## API Documentation
356
+
357
+ ### Health Check
358
+
359
+ ```bash
360
+ GET http://192.168.88.12:7860/health
361
+
362
+ Response:
363
+ {
364
+ "status": "ok",
365
+ "service": "summarizer",
366
+ "version": "4.0.0"
367
+ }
368
+ ```
369
+
370
+ ### Available Endpoints
371
+
372
+ - `GET /` - API documentation (Swagger UI)
373
+ - `GET /health` - Health check
374
+ - `POST /api/v1/*` - Ollama + Transformers (requires Ollama service)
375
+ - `POST /api/v2/*` - HuggingFace streaming (distilbart)
376
+ - `POST /api/v3/*` - Web scraping + V2 summarization
377
+ - `POST /api/v4/*` - Structured JSON summarization (Qwen model)
378
+
379
+ ---
380
+
381
+ ## Security Notes
382
+
383
+ 1. **HTTP Only**: Local testing uses HTTP (not HTTPS)
384
+ 2. **No Authentication**: API is open on local network
385
+ 3. **Rate Limiting**: Not enabled by default for local testing
386
+ 4. **SSRF Protection**: Blocks localhost and private IPs in URL mode
387
+ 5. **Production**: Use HTTPS and authentication for production deployments
388
+
389
+ ---
390
+
391
+ ## Next Steps
392
+
393
+ 1. βœ… Configure your Android app's base URL to `http://192.168.88.12:7860`
394
+ 2. βœ… Add network security config for cleartext HTTP
395
+ 3. βœ… Test connection with cURL before Android testing
396
+ 4. βœ… Implement SSE parsing for NDJSON patches
397
+ 5. βœ… Add error handling for network failures
398
+ 6. βœ… Monitor performance and adjust `max_tokens` as needed
399
+
400
+ ---
401
+
402
+ ## Support
403
+
404
+ - **Server Logs**: `/Users/ming/AndroidStudioProjects/SummerizerApp/server.log`
405
+ - **Configuration**: `/Users/ming/AndroidStudioProjects/SummerizerApp/.env`
406
+ - **Documentation**: See `V4_LOCAL_SETUP.md` and `V4_TESTING_LEARNINGS.md`
README_LOCAL_SETUP.md ADDED
@@ -0,0 +1,605 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Local V4 Server Setup & Management Guide
2
+
3
+ Complete guide for running and managing the V4 summarization server locally for Android app development and testing.
4
+
5
+ ---
6
+
7
+ ## Quick Start
8
+
9
+ ### Prerequisites
10
+ - βœ… Conda environment `summarizer` activated
11
+ - βœ… All dependencies installed (`requirements.txt`)
12
+ - βœ… M4 MacBook Pro with MPS support
13
+ - βœ… Both Mac and Android device on same WiFi network
14
+
15
+ ### Start Server (Fastest Method)
16
+ ```bash
17
+ cd /Users/ming/AndroidStudioProjects/SummerizerApp
18
+ ./start_v4_local.sh
19
+ ```
20
+
21
+ **Your Connection Details:**
22
+ - **Mac IP**: `192.168.88.12`
23
+ - **Base URL**: `http://192.168.88.12:7860`
24
+ - **V4 Endpoint**: `/api/v4/scrape-and-summarize/stream-ndjson`
25
+
26
+ ---
27
+
28
+ ## Server Management Commands
29
+
30
+ ### Starting the Server
31
+
32
+ #### Option 1: Using Startup Script (Recommended)
33
+ ```bash
34
+ ./start_v4_local.sh
35
+ ```
36
+
37
+ **Features:**
38
+ - Automatically detects and stops existing server
39
+ - Shows your local IP address
40
+ - Displays V4 configuration
41
+ - Waits for model to load
42
+ - Shows connection URL
43
+ - Option to view real-time logs
44
+
45
+ #### Option 2: Manual Start
46
+ ```bash
47
+ # Foreground (blocks terminal)
48
+ /opt/anaconda3/envs/summarizer/bin/python -m uvicorn app.main:app --host 0.0.0.0 --port 7860
49
+
50
+ # Background (with logging to file)
51
+ /opt/anaconda3/envs/summarizer/bin/python -m uvicorn app.main:app --host 0.0.0.0 --port 7860 > server.log 2>&1 &
52
+ echo "Server PID: $!"
53
+ ```
54
+
55
+ **Expected Startup Time**: 15-20 seconds
56
+ - Model loading: ~10 seconds
57
+ - V4 warmup: ~2-3 seconds
58
+ - Other services: ~3-5 seconds
59
+
60
+ ---
61
+
62
+ ### Stopping the Server
63
+
64
+ #### Option 1: Kill by Process Name (Recommended)
65
+ ```bash
66
+ pkill -f "uvicorn app.main:app"
67
+ ```
68
+
69
+ #### Option 2: Force Kill by Process Name
70
+ ```bash
71
+ pkill -9 -f "uvicorn app.main:app" && echo "Server stopped"
72
+ ```
73
+
74
+ #### Option 3: Kill by Port
75
+ ```bash
76
+ # Find and kill process using port 7860
77
+ lsof -ti :7860 | xargs kill
78
+
79
+ # Force kill if needed
80
+ lsof -ti :7860 | xargs kill -9
81
+ ```
82
+
83
+ #### Option 4: Kill by PID
84
+ ```bash
85
+ # If you know the PID (shown when server started)
86
+ kill <PID>
87
+
88
+ # Force kill
89
+ kill -9 <PID>
90
+ ```
91
+
92
+ ---
93
+
94
+ ### Restarting the Server
95
+
96
+ #### Quick Restart
97
+ ```bash
98
+ pkill -f "uvicorn app.main:app" && sleep 2 && ./start_v4_local.sh
99
+ ```
100
+
101
+ #### Manual Restart
102
+ ```bash
103
+ # Stop
104
+ pkill -f "uvicorn app.main:app"
105
+ sleep 2
106
+
107
+ # Start
108
+ /opt/anaconda3/envs/summarizer/bin/python -m uvicorn app.main:app --host 0.0.0.0 --port 7860 > server.log 2>&1 &
109
+ ```
110
+
111
+ ---
112
+
113
+ ### Checking Server Status
114
+
115
+ #### Check if Server is Running
116
+ ```bash
117
+ # Check port 7860
118
+ lsof -i :7860
119
+
120
+ # Expected output if running:
121
+ # COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
122
+ # Python 12345 ming 7u IPv4 0x1234567890 0t0 TCP *:7860 (LISTEN)
123
+ ```
124
+
125
+ #### Check Server Health
126
+ ```bash
127
+ # Health endpoint
128
+ curl http://localhost:7860/health
129
+
130
+ # Expected response:
131
+ # {"status":"ok","service":"summarizer","version":"4.0.0"}
132
+ ```
133
+
134
+ #### Check Process Details
135
+ ```bash
136
+ # Find Python process running uvicorn
137
+ ps aux | grep "uvicorn app.main:app"
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Viewing Logs
143
+
144
+ ### Real-Time Logs
145
+ ```bash
146
+ # Follow logs as they happen
147
+ tail -f server.log
148
+
149
+ # Stop following: Ctrl+C
150
+ ```
151
+
152
+ ### Recent Logs
153
+ ```bash
154
+ # Last 50 lines
155
+ tail -50 server.log
156
+
157
+ # Last 100 lines
158
+ tail -100 server.log
159
+
160
+ # Search for specific events
161
+ tail -100 server.log | grep "V4"
162
+ tail -100 server.log | grep "ERROR"
163
+ ```
164
+
165
+ ### Log File Location
166
+ ```
167
+ /Users/ming/AndroidStudioProjects/SummerizerApp/server.log
168
+ ```
169
+
170
+ ---
171
+
172
+ ## Configuration Reference
173
+
174
+ ### Current .env Settings
175
+
176
+ ```bash
177
+ # V4 Structured JSON API
178
+ ENABLE_V4_STRUCTURED=true # Enable V4 API
179
+ ENABLE_V4_WARMUP=true # Load model at startup (faster first request)
180
+
181
+ # V4 Model Configuration
182
+ V4_MODEL_ID=Qwen/Qwen2.5-3B-Instruct # High-quality 3B model
183
+ V4_MAX_TOKENS=512 # Max tokens to generate
184
+ V4_TEMPERATURE=0.2 # Low temp for consistent output
185
+
186
+ # V4 Performance (M4 MacBook Pro)
187
+ V4_USE_FP16_FOR_SPEED=true # Enable FP16 for MPS GPU (2-3x faster)
188
+ V4_ENABLE_QUANTIZATION=false # Quantization not needed with FP16
189
+
190
+ # Server Configuration
191
+ SERVER_HOST=0.0.0.0 # Listen on all interfaces
192
+ SERVER_PORT=7860 # Standard port (required for HF Spaces)
193
+ LOG_LEVEL=INFO # Logging verbosity
194
+
195
+ # V3 Web Scraping (also enabled)
196
+ ENABLE_V3_SCRAPING=true # Enable URL scraping
197
+ SCRAPING_TIMEOUT=10 # HTTP timeout (seconds)
198
+ SCRAPING_CACHE_ENABLED=true # Cache scraped content
199
+ SCRAPING_CACHE_TTL=3600 # Cache for 1 hour
200
+ ```
201
+
202
+ ### Configuration Presets
203
+
204
+ **Fast Inference (Current)**
205
+ ```bash
206
+ V4_MODEL_ID=Qwen/Qwen2.5-3B-Instruct
207
+ V4_USE_FP16_FOR_SPEED=true
208
+ V4_MAX_TOKENS=384
209
+ ```
210
+
211
+ **High Quality (Slower)**
212
+ ```bash
213
+ V4_MODEL_ID=Qwen/Qwen2.5-3B-Instruct
214
+ V4_USE_FP16_FOR_SPEED=true
215
+ V4_MAX_TOKENS=512
216
+ ```
217
+
218
+ **Fastest (Lower Quality)**
219
+ ```bash
220
+ V4_MODEL_ID=Qwen/Qwen2.5-1.5B-Instruct
221
+ V4_USE_FP16_FOR_SPEED=true
222
+ V4_MAX_TOKENS=256
223
+ ```
224
+
225
+ ---
226
+
227
+ ## Testing Commands
228
+
229
+ ### Health Check
230
+ ```bash
231
+ curl http://localhost:7860/health
232
+ ```
233
+
234
+ **Expected Response:**
235
+ ```json
236
+ {"status":"ok","service":"summarizer","version":"4.0.0"}
237
+ ```
238
+
239
+ ---
240
+
241
+ ### V4 Direct Text Test
242
+ ```bash
243
+ curl -X POST http://localhost:7860/api/v4/scrape-and-summarize/stream-ndjson \
244
+ -H "Content-Type: application/json" \
245
+ -d '{
246
+ "text": "Artificial intelligence continues to reshape industries worldwide. Tech giants are investing billions in AI development.",
247
+ "style": "executive",
248
+ "max_tokens": 256
249
+ }'
250
+ ```
251
+
252
+ **Expected Time**: ~30-40 seconds
253
+ **Expected Output**: NDJSON streaming events with structured summary
254
+
255
+ ---
256
+
257
+ ### V4 URL Scraping Test
258
+ ```bash
259
+ curl -X POST http://localhost:7860/api/v4/scrape-and-summarize/stream-ndjson \
260
+ -H "Content-Type: application/json" \
261
+ -d '{
262
+ "url": "https://en.wikipedia.org/wiki/Machine_learning",
263
+ "style": "executive",
264
+ "max_tokens": 512
265
+ }'
266
+ ```
267
+
268
+ **Expected Time**: ~35-65 seconds (scrape + summarize)
269
+ **Expected Output**: Metadata event + NDJSON streaming summary
270
+
271
+ ---
272
+
273
+ ### Test from Android Device (Same WiFi)
274
+ ```bash
275
+ # Run this from your Android device terminal (Termux, etc.)
276
+ curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
277
+ -H "Content-Type: application/json" \
278
+ -d '{"text":"Test from Android","style":"executive","max_tokens":256}'
279
+ ```
280
+
281
+ ---
282
+
283
+ ## Troubleshooting
284
+
285
+ ### Problem: Port Already in Use
286
+
287
+ **Symptom**: `error while attempting to bind on address ('0.0.0.0', 7860): address already in use`
288
+
289
+ **Solution:**
290
+ ```bash
291
+ # Find what's using port 7860
292
+ lsof -i :7860
293
+
294
+ # Kill it
295
+ lsof -ti :7860 | xargs kill -9
296
+
297
+ # Start server again
298
+ ./start_v4_local.sh
299
+ ```
300
+
301
+ ---
302
+
303
+ ### Problem: Server Won't Start
304
+
305
+ **Symptom**: Server exits immediately or crashes on startup
306
+
307
+ **Check Logs:**
308
+ ```bash
309
+ tail -50 server.log
310
+ ```
311
+
312
+ **Common Causes:**
313
+ 1. **Missing loguru**: `pip install "loguru>=0.7.0"`
314
+ 2. **Wrong conda environment**: `conda activate summarizer`
315
+ 3. **Missing dependencies**: `pip install -r requirements.txt`
316
+ 4. **Port conflict**: See "Port Already in Use" above
317
+
318
+ ---
319
+
320
+ ### Problem: Model Loading Errors
321
+
322
+ **Symptom**: `Failed to initialize V4 model` in logs
323
+
324
+ **Solutions:**
325
+
326
+ 1. **Clear model cache:**
327
+ ```bash
328
+ rm -rf /tmp/huggingface
329
+ ```
330
+
331
+ 2. **Check disk space:**
332
+ ```bash
333
+ df -h /tmp
334
+ # Need at least 10GB free
335
+ ```
336
+
337
+ 3. **Verify internet connection** (for first-time model download)
338
+
339
+ ---
340
+
341
+ ### Problem: Slow Performance
342
+
343
+ **Expected Performance:**
344
+ - Startup: 15-20 seconds
345
+ - Inference: 30-40 seconds (short text)
346
+ - Inference: 60-90 seconds (long text/URL)
347
+
348
+ **If slower than expected:**
349
+
350
+ 1. **Check if MPS is being used:**
351
+ ```bash
352
+ tail -50 server.log | grep "MPS\|Model device"
353
+ # Should see: "Model device: mps:0"
354
+ ```
355
+
356
+ 2. **Check system load:**
357
+ ```bash
358
+ top -l 1 | grep "CPU usage"
359
+ # High CPU usage by other apps?
360
+ ```
361
+
362
+ 3. **Verify FP16 is enabled:**
363
+ ```bash
364
+ grep "V4_USE_FP16_FOR_SPEED" .env
365
+ # Should be: V4_USE_FP16_FOR_SPEED=true
366
+ ```
367
+
368
+ ---
369
+
370
+ ### Problem: Connection Refused from Android
371
+
372
+ **Symptom**: Android app can't connect to `http://192.168.88.12:7860`
373
+
374
+ **Checklist:**
375
+
376
+ 1. **Both devices on same WiFi?**
377
+ ```bash
378
+ # On Mac, check network
379
+ ifconfig | grep "inet " | grep -v "127.0.0.1"
380
+ ```
381
+
382
+ 2. **Mac firewall blocking port 7860?**
383
+ - Go to System Settings β†’ Network β†’ Firewall
384
+ - Allow incoming connections or disable firewall temporarily
385
+
386
+ 3. **Server actually running?**
387
+ ```bash
388
+ lsof -i :7860
389
+ curl http://localhost:7860/health
390
+ ```
391
+
392
+ 4. **Test from Mac first:**
393
+ ```bash
394
+ curl http://192.168.88.12:7860/health
395
+ # Should work from Mac's own IP
396
+ ```
397
+
398
+ 5. **Android network security config?**
399
+ - See `ANDROID_V4_LOCAL_TESTING.md` for cleartext HTTP setup
400
+
401
+ ---
402
+
403
+ ### Problem: Empty or Incomplete Summaries
404
+
405
+ **Symptom**: Summary JSON missing fields or truncated
406
+
407
+ **Solutions:**
408
+
409
+ 1. **Increase max_tokens:**
410
+ ```bash
411
+ # In request, use:
412
+ "max_tokens": 512 # instead of 256
413
+ ```
414
+
415
+ 2. **Check input text length:**
416
+ ```bash
417
+ # Minimum 50 characters required
418
+ # Maximum 50,000 characters for URL scraping
419
+ ```
420
+
421
+ 3. **Try different style:**
422
+ ```bash
423
+ # Styles: "executive", "skimmer", "eli5"
424
+ "style": "executive" # Most reliable
425
+ ```
426
+
427
+ ---
428
+
429
+ ## Performance Guide
430
+
431
+ ### Expected Metrics
432
+
433
+ | Metric | Value |
434
+ |--------|-------|
435
+ | **Startup Time** | 15-20 seconds |
436
+ | **Model Load** | ~10 seconds |
437
+ | **V4 Warmup** | ~2-3 seconds |
438
+ | **Memory Usage** | ~6-7GB unified memory |
439
+ | **Tokens/Second** | 2.7 tok/s (3B model on MPS) |
440
+ | **Short Text** (500 chars) | ~30-40 seconds |
441
+ | **Long Text** (5000 chars) | ~60-90 seconds |
442
+ | **URL Scraping** | +2-5 seconds (first time) |
443
+ | **URL Scraping** (cached) | +<10ms |
444
+
445
+ ### Hardware Requirements
446
+
447
+ **Minimum:**
448
+ - Apple Silicon Mac (M1/M2/M3/M4)
449
+ - 8GB unified memory
450
+ - 10GB free disk space
451
+
452
+ **Recommended (Current Setup):**
453
+ - M4 MacBook Pro
454
+ - 24GB unified memory
455
+ - MPS GPU support
456
+ - Fast internet (for model downloads)
457
+
458
+ ### Network Requirements
459
+
460
+ **For Scraping:**
461
+ - Active internet connection
462
+ - Firewall allows outbound HTTPS (443)
463
+
464
+ **For Android Connection:**
465
+ - Both devices on same WiFi network
466
+ - Mac firewall allows incoming on port 7860
467
+
468
+ ---
469
+
470
+ ## API Endpoints Reference
471
+
472
+ ### Available Endpoints
473
+
474
+ | Endpoint | Method | Purpose |
475
+ |----------|--------|---------|
476
+ | `/health` | GET | Health check |
477
+ | `/docs` | GET | Interactive API documentation |
478
+ | `/api/v1/*` | POST | Ollama + Transformers (requires Ollama) |
479
+ | `/api/v2/*` | POST | HuggingFace streaming (distilbart) |
480
+ | `/api/v3/*` | POST | Web scraping + V2 summarization |
481
+ | `/api/v4/scrape-and-summarize/stream-ndjson` | POST | **Structured JSON summarization (RECOMMENDED)** |
482
+ | `/api/v4/scrape-and-summarize/stream` | POST | Raw JSON streaming |
483
+
484
+ ### V4 Request Format
485
+
486
+ ```json
487
+ {
488
+ "url": "https://example.com/article", // URL mode
489
+ // OR
490
+ "text": "Your article text here...", // Text mode
491
+
492
+ "style": "executive", // "executive", "skimmer", "eli5"
493
+ "max_tokens": 512 // 128-2048 range
494
+ }
495
+ ```
496
+
497
+ ### V4 Response Format (NDJSON)
498
+
499
+ ```
500
+ data: {"type":"metadata","data":{...}}
501
+ data: {"delta":{"op":"set","field":"title","value":"..."},...}
502
+ data: {"delta":{"op":"set","field":"main_summary","value":"..."},...}
503
+ data: {"delta":{"op":"append","field":"key_points","value":"..."},...}
504
+ data: {"delta":{"op":"done"},"done":true,"latency_ms":38891.94}
505
+ ```
506
+
507
+ ---
508
+
509
+ ## Android Integration
510
+
511
+ For complete Android integration guide, see:
512
+ πŸ“± **[ANDROID_V4_LOCAL_TESTING.md](./ANDROID_V4_LOCAL_TESTING.md)**
513
+
514
+ **Quick Reference:**
515
+ - Base URL: `http://192.168.88.12:7860`
516
+ - Endpoint: `/api/v4/scrape-and-summarize/stream-ndjson`
517
+ - Network security: Allow cleartext HTTP for `192.168.88.12`
518
+ - Expected latency: 35-65 seconds per request
519
+
520
+ ---
521
+
522
+ ## Development Workflow
523
+
524
+ ### Typical Session
525
+
526
+ 1. **Start server**
527
+ ```bash
528
+ ./start_v4_local.sh
529
+ ```
530
+
531
+ 2. **Test locally**
532
+ ```bash
533
+ curl http://localhost:7860/health
534
+ ```
535
+
536
+ 3. **Test from Android**
537
+ - Open your Android app
538
+ - Configure base URL: `http://192.168.88.12:7860`
539
+ - Test summarization
540
+
541
+ 4. **Monitor logs**
542
+ ```bash
543
+ tail -f server.log
544
+ ```
545
+
546
+ 5. **Stop server when done**
547
+ ```bash
548
+ pkill -f "uvicorn app.main:app"
549
+ ```
550
+
551
+ ---
552
+
553
+ ## Quick Command Reference
554
+
555
+ ```bash
556
+ # START
557
+ ./start_v4_local.sh
558
+
559
+ # STOP
560
+ pkill -f "uvicorn app.main:app"
561
+
562
+ # RESTART
563
+ pkill -f "uvicorn app.main:app" && sleep 2 && ./start_v4_local.sh
564
+
565
+ # STATUS
566
+ lsof -i :7860
567
+ curl http://localhost:7860/health
568
+
569
+ # LOGS
570
+ tail -f server.log
571
+ tail -50 server.log | grep "ERROR"
572
+
573
+ # TEST
574
+ curl -X POST http://localhost:7860/api/v4/scrape-and-summarize/stream-ndjson \
575
+ -H "Content-Type: application/json" \
576
+ -d '{"text":"Test","style":"executive","max_tokens":256}'
577
+ ```
578
+
579
+ ---
580
+
581
+ ## Support & Documentation
582
+
583
+ - **Android Integration**: [ANDROID_V4_LOCAL_TESTING.md](./ANDROID_V4_LOCAL_TESTING.md)
584
+ - **V4 Testing Learnings**: [V4_TESTING_LEARNINGS.md](./V4_TESTING_LEARNINGS.md)
585
+ - **V4 Local Setup**: [V4_LOCAL_SETUP.md](./V4_LOCAL_SETUP.md)
586
+ - **Server Logs**: `server.log`
587
+ - **Configuration**: `.env`
588
+
589
+ ---
590
+
591
+ ## Notes
592
+
593
+ - Server must be running for Android app to connect
594
+ - Both devices must be on same WiFi network
595
+ - Mac IP address may change if you reconnect to WiFi
596
+ - Model is cached in `/tmp/huggingface` (survives restarts)
597
+ - Logs are appended to `server.log` (not rotated automatically)
598
+ - V4 warmup happens on every server start (~2-3 seconds)
599
+
600
+ ---
601
+
602
+ **Last Updated**: 2025-12-12
603
+ **Server Version**: 4.0.0
604
+ **Model**: Qwen/Qwen2.5-3B-Instruct
605
+ **Device**: M4 MacBook Pro with MPS
app/services/structured_summarizer.py CHANGED
@@ -90,16 +90,21 @@ class StructuredSummarizer:
90
 
91
  # Decide device / quantization strategy
92
  use_cuda = torch.cuda.is_available()
 
 
93
  quantization_desc = "None"
94
 
95
  if use_cuda:
96
- logger.info("CUDA is available. Using GPU for V4 model.")
 
 
97
  else:
98
- logger.info("CUDA is NOT available. V4 model will run on CPU.")
99
 
100
  # ------------------------------------------------------------------
101
- # Preferred path: 4-bit NF4 on GPU via bitsandbytes (memory efficient)
102
  # OR FP16 for speed (2-3x faster, uses more memory)
 
103
  # ------------------------------------------------------------------
104
  use_fp16_for_speed = getattr(settings, "v4_use_fp16_for_speed", False)
105
 
@@ -128,42 +133,69 @@ class StructuredSummarizer:
128
  )
129
  quantization_desc = "4-bit NF4 (bitsandbytes, GPU)"
130
 
131
- elif use_cuda and use_fp16_for_speed:
132
- # Use FP16 for 2-3x faster inference (uses ~2-3GB GPU memory)
 
133
  logger.info(
134
- "Loading V4 model in FP16 for maximum speed (2-3x faster than 4-bit)..."
135
- )
136
- self.model = AutoModelForCausalLM.from_pretrained(
137
- settings.v4_model_id,
138
- dtype=torch.float16,
139
- device_map="auto",
140
- cache_dir=settings.hf_cache_dir,
141
- trust_remote_code=True,
142
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  quantization_desc = "FP16 (GPU, fast)"
144
 
145
  else:
146
  # ------------------------------------------------------------------
147
  # Fallback path:
148
- # - GPU without bitsandbytes -> FP16
149
- # - CPU -> FP32 + optional dynamic INT8
150
  # ------------------------------------------------------------------
151
- base_dtype = torch.float16 if use_cuda else torch.float32
152
- logger.info(
153
- "Loading V4 model without 4-bit bitsandbytes. "
154
- f"Base dtype: {base_dtype}"
155
- )
156
 
157
- self.model = AutoModelForCausalLM.from_pretrained(
158
- settings.v4_model_id,
159
- dtype=base_dtype,
160
- device_map="auto" if use_cuda else None,
161
- cache_dir=settings.hf_cache_dir,
162
- trust_remote_code=True,
163
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
 
165
- # Optional dynamic INT8 quantization on CPU
166
- if getattr(settings, "v4_enable_quantization", True) and not use_cuda:
167
  try:
168
  logger.info(
169
  "Applying dynamic INT8 quantization to V4 model on CPU..."
 
90
 
91
  # Decide device / quantization strategy
92
  use_cuda = torch.cuda.is_available()
93
+ use_mps = torch.backends.mps.is_available() if hasattr(torch.backends, 'mps') else False
94
+ use_gpu = use_cuda or use_mps
95
  quantization_desc = "None"
96
 
97
  if use_cuda:
98
+ logger.info("CUDA is available. Using NVIDIA GPU for V4 model.")
99
+ elif use_mps:
100
+ logger.info("MPS (Metal Performance Shaders) is available. Using Apple Silicon GPU for V4 model.")
101
  else:
102
+ logger.info("No GPU available. V4 model will run on CPU.")
103
 
104
  # ------------------------------------------------------------------
105
+ # Preferred path: 4-bit NF4 on CUDA GPU via bitsandbytes (memory efficient)
106
  # OR FP16 for speed (2-3x faster, uses more memory)
107
+ # Note: bitsandbytes only works on CUDA, not MPS
108
  # ------------------------------------------------------------------
109
  use_fp16_for_speed = getattr(settings, "v4_use_fp16_for_speed", False)
110
 
 
133
  )
134
  quantization_desc = "4-bit NF4 (bitsandbytes, GPU)"
135
 
136
+ elif use_gpu and use_fp16_for_speed:
137
+ # Use FP16 for 2-3x faster inference
138
+ # Note: MPS doesn't support BFloat16, so we avoid device_map="auto" for MPS
139
  logger.info(
140
+ "Loading V4 model in FP16 for maximum speed (2-3x faster than FP32)..."
 
 
 
 
 
 
 
141
  )
142
+
143
+ if use_mps:
144
+ # MPS: Load without device_map, then manually move to MPS
145
+ self.model = AutoModelForCausalLM.from_pretrained(
146
+ settings.v4_model_id,
147
+ torch_dtype=torch.float16,
148
+ cache_dir=settings.hf_cache_dir,
149
+ trust_remote_code=True,
150
+ )
151
+ self.model = self.model.to("mps")
152
+ else:
153
+ # CUDA: Use device_map="auto" for multi-GPU support
154
+ self.model = AutoModelForCausalLM.from_pretrained(
155
+ settings.v4_model_id,
156
+ torch_dtype=torch.float16,
157
+ device_map="auto",
158
+ cache_dir=settings.hf_cache_dir,
159
+ trust_remote_code=True,
160
+ )
161
  quantization_desc = "FP16 (GPU, fast)"
162
 
163
  else:
164
  # ------------------------------------------------------------------
165
  # Fallback path:
166
+ # - GPU (CUDA/MPS) without quantization/FP16 -> FP16
167
+ # - CPU -> FP32 + optional dynamic INT8
168
  # ------------------------------------------------------------------
169
+ base_dtype = torch.float16 if use_gpu else torch.float32
 
 
 
 
170
 
171
+ if use_mps:
172
+ # MPS fallback: Load without device_map, manually move to MPS
173
+ logger.info(
174
+ f"Loading V4 model for MPS with dtype={base_dtype}"
175
+ )
176
+ self.model = AutoModelForCausalLM.from_pretrained(
177
+ settings.v4_model_id,
178
+ torch_dtype=base_dtype,
179
+ cache_dir=settings.hf_cache_dir,
180
+ trust_remote_code=True,
181
+ )
182
+ self.model = self.model.to("mps")
183
+ else:
184
+ # CUDA or CPU
185
+ device_strategy = "auto" if use_cuda else None
186
+ logger.info(
187
+ f"Loading V4 model with device_map='{device_strategy}', dtype={base_dtype}"
188
+ )
189
+ self.model = AutoModelForCausalLM.from_pretrained(
190
+ settings.v4_model_id,
191
+ torch_dtype=base_dtype,
192
+ device_map=device_strategy,
193
+ cache_dir=settings.hf_cache_dir,
194
+ trust_remote_code=True,
195
+ )
196
 
197
+ # Optional dynamic INT8 quantization on CPU only (not supported on GPU)
198
+ if getattr(settings, "v4_enable_quantization", True) and not use_gpu:
199
  try:
200
  logger.info(
201
  "Applying dynamic INT8 quantization to V4 model on CPU..."
start_v4_local.sh ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # V4 Local Testing Server Startup Script
4
+ # This script starts the FastAPI server with V4 enabled for Android app testing
5
+
6
+ set -e
7
+
8
+ # Colors for output
9
+ GREEN='\033[0;32m'
10
+ BLUE='\033[0;34m'
11
+ YELLOW='\033[1;33m'
12
+ RED='\033[0;31m'
13
+ NC='\033[0m' # No Color
14
+
15
+ echo -e "${BLUE}╔══════════════════════════════════════════════════════════╗${NC}"
16
+ echo -e "${BLUE}β•‘ V4 Local Testing Server β•‘${NC}"
17
+ echo -e "${BLUE}β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•${NC}"
18
+ echo ""
19
+
20
+ # Check if server is already running
21
+ if lsof -Pi :7860 -sTCP:LISTEN -t >/dev/null 2>&1; then
22
+ echo -e "${YELLOW}⚠️ Server already running on port 7860${NC}"
23
+ echo -e "${YELLOW} Stopping existing server...${NC}"
24
+ pkill -f "uvicorn app.main:app" || true
25
+ sleep 2
26
+ fi
27
+
28
+ # Get local IP address
29
+ LOCAL_IP=$(ifconfig | grep "inet " | grep -v "127.0.0.1" | awk '{print $2}' | head -1)
30
+ if [ -z "$LOCAL_IP" ]; then
31
+ LOCAL_IP="Unable to detect"
32
+ echo -e "${RED}⚠️ Could not detect local IP address${NC}"
33
+ else
34
+ echo -e "${GREEN}βœ… Local IP Address: ${LOCAL_IP}${NC}"
35
+ fi
36
+
37
+ # Check .env configuration
38
+ if [ -f ".env" ]; then
39
+ echo -e "${GREEN}βœ… Found .env configuration${NC}"
40
+
41
+ # Show V4 config
42
+ echo ""
43
+ echo -e "${BLUE}V4 Configuration:${NC}"
44
+ grep "^ENABLE_V4" .env || echo " No V4 settings found"
45
+ grep "^V4_MODEL_ID" .env || echo " No model configured"
46
+ grep "^V4_MAX_TOKENS" .env || echo " Using default tokens"
47
+ else
48
+ echo -e "${RED}❌ No .env file found!${NC}"
49
+ echo -e "${YELLOW} Please create .env with V4 configuration${NC}"
50
+ exit 1
51
+ fi
52
+
53
+ echo ""
54
+ echo -e "${BLUE}Starting server...${NC}"
55
+ echo -e "${BLUE}This may take 30-90 seconds for V4 model warmup${NC}"
56
+ echo ""
57
+
58
+ # Start server in background and log to file
59
+ /opt/anaconda3/envs/summarizer/bin/python -m uvicorn app.main:app \
60
+ --host 0.0.0.0 \
61
+ --port 7860 \
62
+ > server.log 2>&1 &
63
+
64
+ SERVER_PID=$!
65
+ echo -e "${GREEN}βœ… Server started (PID: ${SERVER_PID})${NC}"
66
+
67
+ # Wait for server to be ready
68
+ echo -e "${YELLOW}⏳ Waiting for server to initialize...${NC}"
69
+ TIMEOUT=120
70
+ ELAPSED=0
71
+ while [ $ELAPSED -lt $TIMEOUT ]; do
72
+ if lsof -Pi :7860 -sTCP:LISTEN -t >/dev/null 2>&1; then
73
+ echo -e "${GREEN}βœ… Server is listening on port 7860${NC}"
74
+ break
75
+ fi
76
+ sleep 2
77
+ ELAPSED=$((ELAPSED + 2))
78
+
79
+ # Show progress every 10 seconds
80
+ if [ $((ELAPSED % 10)) -eq 0 ]; then
81
+ echo -e "${YELLOW} Still loading... (${ELAPSED}s / ${TIMEOUT}s)${NC}"
82
+ fi
83
+ done
84
+
85
+ if [ $ELAPSED -ge $TIMEOUT ]; then
86
+ echo -e "${RED}❌ Server failed to start within ${TIMEOUT} seconds${NC}"
87
+ echo -e "${YELLOW} Check server.log for errors${NC}"
88
+ exit 1
89
+ fi
90
+
91
+ # Wait a bit more for V4 warmup
92
+ echo -e "${YELLOW}⏳ Waiting for V4 model warmup (may take 60-90s)...${NC}"
93
+ sleep 15
94
+
95
+ # Test health endpoint
96
+ echo ""
97
+ echo -e "${BLUE}Testing server health...${NC}"
98
+ if curl -s http://localhost:7860/health > /dev/null 2>&1; then
99
+ echo -e "${GREEN}βœ… Server is healthy and responding${NC}"
100
+ else
101
+ echo -e "${YELLOW}⚠️ Health check failed, but server may still be warming up${NC}"
102
+ fi
103
+
104
+ echo ""
105
+ echo -e "${GREEN}╔══════════════════════════════════════════════════════════╗${NC}"
106
+ echo -e "${GREEN}β•‘ Server Started Successfully! β•‘${NC}"
107
+ echo -e "${GREEN}β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•${NC}"
108
+ echo ""
109
+ echo -e "${BLUE}Local Access:${NC}"
110
+ echo -e " http://localhost:7860"
111
+ echo ""
112
+ echo -e "${BLUE}Android App URL:${NC}"
113
+ echo -e " http://${LOCAL_IP}:7860"
114
+ echo ""
115
+ echo -e "${BLUE}V4 NDJSON Endpoint:${NC}"
116
+ echo -e " POST http://${LOCAL_IP}:7860/api/v4/scrape-and-summarize/stream-ndjson"
117
+ echo ""
118
+ echo -e "${BLUE}API Documentation:${NC}"
119
+ echo -e " http://localhost:7860/docs"
120
+ echo ""
121
+ echo -e "${BLUE}Server Logs:${NC}"
122
+ echo -e " tail -f server.log"
123
+ echo ""
124
+ echo -e "${BLUE}Stop Server:${NC}"
125
+ echo -e " pkill -f 'uvicorn app.main:app'"
126
+ echo -e " or: kill ${SERVER_PID}"
127
+ echo ""
128
+ echo -e "${YELLOW}πŸ“± Update your Android app base URL to: http://${LOCAL_IP}:7860${NC}"
129
+ echo -e "${YELLOW}πŸ“– See ANDROID_V4_LOCAL_TESTING.md for complete setup guide${NC}"
130
+ echo ""
131
+
132
+ # Optionally tail logs
133
+ read -p "Show real-time logs? (y/N): " -n 1 -r
134
+ echo
135
+ if [[ $REPLY =~ ^[Yy]$ ]]; then
136
+ echo -e "${BLUE}Showing server logs (Ctrl+C to stop)...${NC}"
137
+ tail -f server.log
138
+ fi