Distopia22 commited on
Commit
d03f587
ยท
0 Parent(s):

Production-ready Medical Coding API with Phi-3 support

Browse files
Files changed (7) hide show
  1. Dockerfile +58 -0
  2. README.md +114 -0
  3. app/__init__.py +3 -0
  4. app/api.py +282 -0
  5. app/model_loader.py +133 -0
  6. app/prompt_template.py +28 -0
  7. requirements.txt +21 -0
Dockerfile ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # Set working directory
4
+ WORKDIR /app
5
+
6
+ # Set environment variables
7
+ ENV PYTHONUNBUFFERED=1 \
8
+ PYTHONDONTWRITEBYTECODE=1 \
9
+ PIP_NO_CACHE_DIR=1 \
10
+ PIP_DISABLE_PIP_VERSION_CHECK=1 \
11
+ TRANSFORMERS_CACHE=/app/.cache/transformers \
12
+ HF_HOME=/app/.cache/huggingface \
13
+ DEBIAN_FRONTEND=noninteractive
14
+
15
+ # Install system dependencies
16
+ RUN apt-get update && apt-get install -y --no-install-recommends \
17
+ git \
18
+ git-lfs \
19
+ build-essential \
20
+ curl \
21
+ ca-certificates \
22
+ && git lfs install \
23
+ && rm -rf /var/lib/apt/lists/* \
24
+ && apt-get clean
25
+
26
+ # Upgrade pip and install build tools
27
+ RUN pip install --no-cache-dir --upgrade \
28
+ pip==24.0 \
29
+ setuptools==69.5.1 \
30
+ wheel==0.43.0
31
+
32
+ # Copy requirements first for better Docker caching
33
+ COPY requirements.txt .
34
+
35
+ # Install Python dependencies
36
+ RUN pip install --no-cache-dir -r requirements.txt
37
+
38
+ # Copy application code
39
+ COPY app/ ./app/
40
+
41
+ # Create necessary directories with proper permissions
42
+ RUN mkdir -p /app/offload /app/.cache/transformers /app/.cache/huggingface && \
43
+ chmod -R 777 /app/offload /app/.cache
44
+
45
+ # Expose port 7860 (HuggingFace Spaces standard)
46
+ EXPOSE 7860
47
+
48
+ # Health check - more lenient for model loading
49
+ HEALTHCHECK --interval=30s --timeout=20s --start-period=300s --retries=5 \
50
+ CMD curl -f http://localhost:7860/health || exit 1
51
+
52
+ # Run the application with increased timeouts
53
+ CMD ["uvicorn", "app.api:app", \
54
+ "--host", "0.0.0.0", \
55
+ "--port", "7860", \
56
+ "--timeout-keep-alive", "300", \
57
+ "--workers", "1", \
58
+ "--log-level", "info"]
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Medical Coding API
3
+ emoji: ๐Ÿฅ
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ app_port: 7860
10
+ tags:
11
+ - medical
12
+ - healthcare
13
+ - icd-10
14
+ - cpt
15
+ - phi-3
16
+ - fastapi
17
+ ---
18
+
19
+ # ๐Ÿฅ Medical Coding API
20
+
21
+ AI-powered API for extracting **ICD-10** and **CPT codes** from clinical provider notes using Microsoft Phi-3.
22
+
23
+ ## ๐Ÿš€ Features
24
+
25
+ - โœ… Extract ICD-10 diagnosis codes
26
+ - โœ… Extract CPT procedure codes
27
+ - โœ… Supports notes up to 10,000 characters (~2,500 words)
28
+ - โœ… JSON output format
29
+ - โœ… GPU-accelerated inference (when available)
30
+ - โœ… Automatic text truncation
31
+ - โœ… Production-ready with error handling
32
+
33
+ ## ๐Ÿ“ก API Endpoints
34
+
35
+ ### POST `/predict`
36
+
37
+ Extract medical codes from clinical note.
38
+
39
+ **Request:**
40
+
41
+ ```json
42
+ {
43
+ "note": "Your clinical note here..."
44
+ }
45
+ ```
46
+
47
+ **Response:**
48
+
49
+ ```json
50
+ {
51
+ "result": {
52
+ "icd10_codes": ["I10", "E11.9"],
53
+ "cpt_codes": ["99213"]
54
+ },
55
+ "raw_output": "...",
56
+ "note_length": 250,
57
+ "truncated": false,
58
+ "processing_time": 3.45
59
+ }
60
+ ```
61
+
62
+ ### GET `/health`
63
+
64
+ Check API health status.
65
+
66
+ ### GET `/docs`
67
+
68
+ Interactive API documentation (Swagger UI).
69
+
70
+ ## ๐Ÿงช Usage Examples
71
+
72
+ ### cURL
73
+
74
+ ```bash
75
+ curl -X POST "https://YOUR-SPACE.hf.space/predict" \
76
+ -H "Content-Type: application/json" \
77
+ -d '{"note": "Patient with HTN, BP 160/95. Prescribed lisinopril."}'
78
+ ```
79
+
80
+ ### Python
81
+
82
+ ```python
83
+ import requests
84
+
85
+ response = requests.post(
86
+ "https://YOUR-SPACE.hf.space/predict",
87
+ json={"note": "Patient with diabetes, HbA1c 8.2. Started metformin."}
88
+ )
89
+ print(response.json())
90
+ ```
91
+
92
+ ### JavaScript
93
+
94
+ ```javascript
95
+ fetch("https://YOUR-SPACE.hf.space/predict", {
96
+ method: "POST",
97
+ headers: { "Content-Type": "application/json" },
98
+ body: JSON.stringify({ note: "Clinical note here..." }),
99
+ })
100
+ .then((res) => res.json())
101
+ .then((data) => console.log(data));
102
+ ```
103
+
104
+ ## โš™๏ธ Technical Details
105
+
106
+ - **Model:** RayyanAhmed9477/med-coding (Phi-3 based)
107
+ - **Framework:** FastAPI + Transformers
108
+ - **Deployment:** HuggingFace Spaces (Docker)
109
+ - **First Request:** 30-60 seconds (model loading)
110
+ - **Subsequent Requests:** 2-10 seconds
111
+
112
+ ## ๐Ÿ“ License
113
+
114
+ MIT License
app/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """Medical Coding API - Extract ICD-10 and CPT codes from clinical notes."""
2
+
3
+ __version__ = "1.0.0"
app/api.py ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # api.py
2
+ import re
3
+ import json
4
+ import gc
5
+ import time
6
+ from typing import Optional
7
+ from fastapi import FastAPI, HTTPException, Request
8
+ from fastapi.responses import JSONResponse
9
+ from pydantic import BaseModel, Field
10
+ from .model_loader import load_model_and_tokenizer
11
+ from .prompt_template import PROMPT_TEMPLATE
12
+ import logging
13
+
14
+ # Configure logging
15
+ logging.basicConfig(level=logging.INFO)
16
+ logger = logging.getLogger(__name__)
17
+
18
+ app = FastAPI(
19
+ title="Medical Coding API",
20
+ description="Extract ICD-10 and CPT codes from clinical notes using AI",
21
+ version="1.0.0",
22
+ docs_url="/docs",
23
+ redoc_url="/redoc"
24
+ )
25
+
26
+ class NoteRequest(BaseModel):
27
+ note: str = Field(
28
+ ...,
29
+ min_length=10,
30
+ max_length=50000,
31
+ description="Clinical provider note (10-50,000 characters)"
32
+ )
33
+
34
+ class Config:
35
+ json_schema_extra = {
36
+ "example": {
37
+ "note": "Patient presents with essential hypertension. BP 160/95. Prescribed lisinopril 10mg daily. Office visit for established patient."
38
+ }
39
+ }
40
+
41
+ class CodingResponse(BaseModel):
42
+ result: dict = Field(..., description="Extracted ICD-10 and CPT codes")
43
+ raw_output: str = Field(..., description="Raw model output")
44
+ note_length: int = Field(..., description="Length of input note in characters")
45
+ truncated: bool = Field(..., description="Whether note was truncated")
46
+ processing_time: float = Field(..., description="Time taken to process in seconds")
47
+
48
+ # Global variables for lazy loading
49
+ _gen_pipeline = None
50
+ _tokenizer = None
51
+ _model_load_time = None
52
+
53
+ def get_model():
54
+ """Lazy load model on first request with error handling."""
55
+ global _gen_pipeline, _tokenizer, _model_load_time
56
+
57
+ if _gen_pipeline is None:
58
+ logger.info("๐Ÿ”„ Loading model for the first time...")
59
+ start_time = time.time()
60
+
61
+ try:
62
+ _gen_pipeline, _tokenizer = load_model_and_tokenizer()
63
+ _model_load_time = time.time() - start_time
64
+ logger.info(f"โœ… Model loaded in {_model_load_time:.2f} seconds")
65
+ except Exception as e:
66
+ logger.error(f"โŒ Failed to load model: {str(e)}")
67
+ raise HTTPException(
68
+ status_code=503,
69
+ detail=f"Model loading failed: {str(e)}. Please try again in a few moments."
70
+ )
71
+
72
+ return _gen_pipeline, _tokenizer
73
+
74
+ def extract_json_from_text(text: str) -> Optional[str]:
75
+ """Extract JSON object from text using brace counting."""
76
+ start_idx = text.find('{')
77
+ if start_idx == -1:
78
+ return None
79
+
80
+ brace_count = 0
81
+ for i in range(start_idx, len(text)):
82
+ if text[i] == '{':
83
+ brace_count += 1
84
+ elif text[i] == '}':
85
+ brace_count -= 1
86
+ if brace_count == 0:
87
+ return text[start_idx:i+1]
88
+ return None
89
+
90
+ def truncate_note(note: str, max_chars: int = 10000) -> str:
91
+ """Truncate note to prevent token limit issues."""
92
+ if len(note) <= max_chars:
93
+ return note
94
+
95
+ logger.warning(f"Note truncated from {len(note)} to {max_chars} characters")
96
+ return note[:max_chars]
97
+
98
+ # ===== ENDPOINTS =====
99
+
100
+ @app.get("/")
101
+ async def root():
102
+ """Root endpoint with API information."""
103
+ return {
104
+ "name": "Medical Coding API",
105
+ "version": "1.0.0",
106
+ "description": "Extract ICD-10 and CPT codes from clinical notes",
107
+ "model": "RayyanAhmed9477/med-coding (Phi-3 based)",
108
+ "endpoints": {
109
+ "/predict": "POST - Extract medical codes from clinical note",
110
+ "/health": "GET - Check API health status",
111
+ "/docs": "GET - Interactive API documentation",
112
+ "/metrics": "GET - API usage metrics"
113
+ },
114
+ "usage": {
115
+ "endpoint": "/predict",
116
+ "method": "POST",
117
+ "body": {"note": "Your clinical note here (10-50,000 chars)"},
118
+ "max_note_length": "50,000 characters (~10,000 words)"
119
+ }
120
+ }
121
+
122
+ @app.get("/health")
123
+ async def health_check():
124
+ """Health check endpoint."""
125
+ return {
126
+ "status": "healthy",
127
+ "model": "RayyanAhmed9477/med-coding",
128
+ "model_loaded": _gen_pipeline is not None,
129
+ "model_load_time": f"{_model_load_time:.2f}s" if _model_load_time else "not loaded yet"
130
+ }
131
+
132
+ @app.get("/metrics")
133
+ async def metrics():
134
+ """Get API usage metrics."""
135
+ return {
136
+ "model_loaded": _gen_pipeline is not None,
137
+ "model_load_time_seconds": _model_load_time,
138
+ "status": "operational"
139
+ }
140
+
141
+ @app.post("/predict", response_model=CodingResponse)
142
+ async def predict(request: NoteRequest):
143
+ """
144
+ Extract ICD-10 and CPT codes from clinical notes.
145
+
146
+ **Input:** Clinical note (10-50,000 characters)
147
+
148
+ **Output:** JSON with extracted codes:
149
+ - icd10_codes: List of ICD-10 diagnosis codes
150
+ - cpt_codes: List of CPT procedure codes
151
+
152
+ **Note:** First request may take 30-60 seconds as model loads into memory.
153
+ Subsequent requests will be faster (2-10 seconds).
154
+ """
155
+ start_time = time.time()
156
+
157
+ try:
158
+ # Validate input
159
+ note = request.note.strip()
160
+ if not note:
161
+ raise HTTPException(status_code=400, detail="Empty note provided")
162
+
163
+ # Load model (lazy loading)
164
+ logger.info(f"๐Ÿ“ Processing note ({len(note)} characters)")
165
+ gen_pipeline, tokenizer = get_model()
166
+
167
+ # Truncate if needed
168
+ original_length = len(note)
169
+ note_truncated = truncate_note(note, max_chars=10000)
170
+
171
+ # Build prompt
172
+ prompt = PROMPT_TEMPLATE.format(note=note_truncated)
173
+ logger.info(f"๐Ÿ”ฎ Generating prediction (prompt length: {len(prompt)} chars)")
174
+
175
+ # Generate prediction
176
+ outputs = gen_pipeline(
177
+ prompt,
178
+ max_new_tokens=600,
179
+ do_sample=False,
180
+ num_return_sequences=1,
181
+ pad_token_id=tokenizer.eos_token_id,
182
+ eos_token_id=tokenizer.eos_token_id,
183
+ temperature=0.1,
184
+ top_p=0.95,
185
+ return_full_text=False
186
+ )
187
+
188
+ # Extract generated text
189
+ if isinstance(outputs, list) and len(outputs) > 0:
190
+ text = outputs[0].get("generated_text", "")
191
+ elif isinstance(outputs, dict):
192
+ text = outputs.get("generated_text", "")
193
+ else:
194
+ text = str(outputs)
195
+
196
+ logger.info(f"๐Ÿ“ค Model output length: {len(text)} characters")
197
+
198
+ # Remove prompt if present
199
+ if prompt in text:
200
+ text = text.replace(prompt, "").strip()
201
+
202
+ # Extract JSON
203
+ json_str = extract_json_from_text(text)
204
+
205
+ if json_str is None:
206
+ logger.error(f"No JSON found in output: {text[:500]}")
207
+ raise HTTPException(
208
+ status_code=500,
209
+ detail={
210
+ "error": "No valid JSON found in model output",
211
+ "raw_output_preview": text[:300],
212
+ "suggestion": "Model may need fine-tuning or prompt adjustment"
213
+ }
214
+ )
215
+
216
+ # Parse JSON
217
+ try:
218
+ parsed = json.loads(json_str)
219
+ except json.JSONDecodeError as e:
220
+ logger.error(f"JSON parse error: {str(e)}")
221
+ raise HTTPException(
222
+ status_code=500,
223
+ detail={
224
+ "error": f"Invalid JSON format: {str(e)}",
225
+ "json_preview": json_str[:300]
226
+ }
227
+ )
228
+
229
+ # Validate response structure
230
+ if not isinstance(parsed, dict):
231
+ raise HTTPException(
232
+ status_code=500,
233
+ detail="Model output is not a valid JSON object"
234
+ )
235
+
236
+ # Clean up memory
237
+ gc.collect()
238
+
239
+ processing_time = time.time() - start_time
240
+ logger.info(f"โœ… Prediction completed in {processing_time:.2f} seconds")
241
+
242
+ return CodingResponse(
243
+ result=parsed,
244
+ raw_output=text,
245
+ note_length=original_length,
246
+ truncated=original_length > 10000,
247
+ processing_time=round(processing_time, 2)
248
+ )
249
+
250
+ except HTTPException:
251
+ raise
252
+ except Exception as e:
253
+ logger.error(f"โŒ Prediction failed: {str(e)}", exc_info=True)
254
+ gc.collect()
255
+ raise HTTPException(
256
+ status_code=500,
257
+ detail=f"Prediction failed: {str(e)}"
258
+ )
259
+
260
+ @app.exception_handler(Exception)
261
+ async def global_exception_handler(request: Request, exc: Exception):
262
+ """Global exception handler for unhandled errors."""
263
+ logger.error(f"Unhandled exception: {str(exc)}", exc_info=True)
264
+ return JSONResponse(
265
+ status_code=500,
266
+ content={
267
+ "detail": "Internal server error",
268
+ "error": str(exc),
269
+ "path": str(request.url)
270
+ }
271
+ )
272
+
273
+ # Startup event
274
+ @app.on_event("startup")
275
+ async def startup_event():
276
+ """Log startup information."""
277
+ logger.info("=" * 60)
278
+ logger.info("๐Ÿš€ Medical Coding API Starting...")
279
+ logger.info("=" * 60)
280
+ logger.info("โณ Model will be loaded on first /predict request")
281
+ logger.info("๐Ÿ“š API Documentation: /docs")
282
+ logger.info("=" * 60)
app/model_loader.py ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # model_loader.py
2
+ import os
3
+ import sys
4
+ from transformers import (
5
+ AutoTokenizer,
6
+ AutoModelForCausalLM,
7
+ AutoConfig,
8
+ pipeline
9
+ )
10
+ import torch
11
+ import warnings
12
+
13
+ warnings.filterwarnings("ignore")
14
+
15
+ MODEL_NAME = "RayyanAhmed9477/med-coding"
16
+
17
+ def load_model_and_tokenizer():
18
+ """
19
+ Loads Phi-3 model with comprehensive error handling and fallbacks.
20
+ Supports both CPU and GPU with automatic detection.
21
+ """
22
+ device = "cuda" if torch.cuda.is_available() else "cpu"
23
+ print(f"๐Ÿ”ง Using device: {device}")
24
+ print(f"๐Ÿ”ง PyTorch version: {torch.__version__}")
25
+ print(f"๐Ÿ”ง Transformers version: {sys.modules['transformers'].__version__}")
26
+
27
+ # Get HuggingFace token from environment
28
+ hf_token = os.getenv("HF_TOKEN")
29
+
30
+ try:
31
+ # ===== STEP 1: Load Tokenizer =====
32
+ print(f"๐Ÿ“ฅ Loading tokenizer: {MODEL_NAME}")
33
+ tokenizer = AutoTokenizer.from_pretrained(
34
+ MODEL_NAME,
35
+ trust_remote_code=True, # Critical for Phi-3
36
+ token=hf_token,
37
+ use_fast=True,
38
+ legacy=False
39
+ )
40
+
41
+ # Configure tokenizer
42
+ if tokenizer.pad_token is None:
43
+ tokenizer.pad_token = tokenizer.eos_token
44
+ if not hasattr(tokenizer, 'padding_side') or tokenizer.padding_side is None:
45
+ tokenizer.padding_side = "left"
46
+
47
+ print("โœ… Tokenizer loaded successfully")
48
+
49
+ # ===== STEP 2: Load Configuration with trust_remote_code =====
50
+ print(f"๐Ÿ“ฅ Loading model configuration: {MODEL_NAME}")
51
+ config = AutoConfig.from_pretrained(
52
+ MODEL_NAME,
53
+ trust_remote_code=True, # Critical for Phi-3
54
+ token=hf_token
55
+ )
56
+ print(f"โœ… Config loaded: {config.model_type}")
57
+
58
+ # ===== STEP 3: Load Model =====
59
+ print(f"๐Ÿ“ฅ Loading model: {MODEL_NAME}")
60
+ print("โณ This may take 2-5 minutes on first load...")
61
+
62
+ if device == "cuda":
63
+ # GPU Configuration
64
+ print("๐ŸŽฎ Using GPU with bfloat16 precision")
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ MODEL_NAME,
67
+ config=config,
68
+ trust_remote_code=True,
69
+ torch_dtype=torch.bfloat16,
70
+ device_map="auto",
71
+ token=hf_token,
72
+ low_cpu_mem_usage=True,
73
+ attn_implementation="eager" # More stable than flash attention
74
+ )
75
+ else:
76
+ # CPU Configuration - optimized for stability
77
+ print("๐Ÿ’ป Using CPU with float32 precision")
78
+ model = AutoModelForCausalLM.from_pretrained(
79
+ MODEL_NAME,
80
+ config=config,
81
+ trust_remote_code=True,
82
+ torch_dtype=torch.float32,
83
+ device_map={"": "cpu"},
84
+ token=hf_token,
85
+ low_cpu_mem_usage=True,
86
+ offload_folder="offload",
87
+ attn_implementation="eager"
88
+ )
89
+
90
+ # Set model to evaluation mode
91
+ model.eval()
92
+
93
+ # Disable gradients to save memory
94
+ for param in model.parameters():
95
+ param.requires_grad = False
96
+
97
+ print("โœ… Model loaded successfully!")
98
+
99
+ # ===== STEP 4: Create Pipeline =====
100
+ print("๐Ÿ”ง Creating text generation pipeline...")
101
+ gen_pipeline = pipeline(
102
+ "text-generation",
103
+ model=model,
104
+ tokenizer=tokenizer,
105
+ device=0 if device == "cuda" else -1,
106
+ torch_dtype=torch.bfloat16 if device == "cuda" else torch.float32,
107
+ framework="pt"
108
+ )
109
+
110
+ print("โœ… Pipeline created successfully!")
111
+ print("=" * 60)
112
+ print("๐ŸŽ‰ MODEL READY FOR INFERENCE")
113
+ print("=" * 60)
114
+
115
+ return gen_pipeline, tokenizer
116
+
117
+ except Exception as e:
118
+ print(f"โŒ Error during model loading: {str(e)}")
119
+ print("\n๐Ÿ” Diagnostic Information:")
120
+ print(f" - Model: {MODEL_NAME}")
121
+ print(f" - Device: {device}")
122
+ print(f" - Token available: {hf_token is not None}")
123
+
124
+ import traceback
125
+ traceback.print_exc()
126
+
127
+ raise RuntimeError(
128
+ f"Failed to load model {MODEL_NAME}. "
129
+ "Please check: "
130
+ "1) Internet connection, "
131
+ "2) HuggingFace token (if model is private), "
132
+ "3) Transformers version (requires >=4.36.0 for Phi-3)"
133
+ ) from e
app/prompt_template.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # prompt_template.py
2
+ PROMPT_TEMPLATE = """<|system|>
3
+ You are an expert medical coding assistant specialized in extracting ICD-10 diagnosis codes and CPT procedure codes from clinical notes.
4
+
5
+ Your task:
6
+ 1. Analyze the clinical note carefully
7
+ 2. Extract all relevant ICD-10 codes (diagnosis codes)
8
+ 3. Extract all relevant CPT codes (procedure/service codes)
9
+ 4. Return ONLY valid medical codes found in the note
10
+ 5. Format your response as JSON with this exact structure:
11
+
12
+ {{
13
+ "icd10_codes": ["code1", "code2"],
14
+ "cpt_codes": ["code1", "code2"]
15
+ }}
16
+
17
+ Rules:
18
+ - Only include codes explicitly mentioned or clearly implied in the note
19
+ - Use standard ICD-10 and CPT code formats
20
+ - If no codes found, return empty arrays: {{"icd10_codes": [], "cpt_codes": []}}
21
+ - Do not include explanations, only the JSON object
22
+ <|end|>
23
+ <|user|>
24
+ Clinical Note:
25
+ {note}
26
+ <|end|>
27
+ <|assistant|>
28
+ """
requirements.txt ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Web Framework
2
+ fastapi==0.109.2
3
+ uvicorn[standard]==0.27.1
4
+ python-multipart==0.0.9
5
+
6
+ # Machine Learning - CRITICAL VERSIONS FOR PHI-3
7
+ transformers==4.41.2
8
+ torch==2.2.2
9
+ accelerate==0.30.1
10
+ safetensors==0.4.3
11
+ sentencepiece==0.2.0
12
+
13
+ # Utilities
14
+ pydantic==2.7.1
15
+ pydantic-settings==2.2.1
16
+ python-dotenv==1.0.1
17
+ protobuf==4.25.3
18
+ einops==0.8.0
19
+
20
+ # Monitoring
21
+ psutil==5.9.8