Spaces:

FocusGuard
/

IntegrationTest

Sleeping

Abdelrahman Almatrooshi commited on 6 days ago

Commit

2eba0cc

1 Parent(s): fad97ce

Integrate L2CS-Net gaze estimation

- Add L2CS-Net in-tree (models/L2CS-Net/) with Gaze360 weights via Git LFS
- L2CSPipeline: ResNet50 gaze + MediaPipe head pose, roll de-rotation, cosine scoring
- 9-point polynomial gaze calibration with bias correction and IQR outlier filtering
- Gaze-eye fusion: calibrated screen coords + EAR for focus detection
- L2CS Boost mode: runs gaze alongside any base model (35/65 weight, veto at 0.38)
- Calibration UI: fullscreen overlay, auto-advance, progress ring
- Frontend: GAZE toggle, Calibrate button, gaze pointer dot on canvas
- Bumped capture resolution to 640x480 @ JPEG 0.75
- Dockerfile: added git, CPU-only torch for HF Space deployment

Files changed (28) hide show

Dockerfile +8 -1
README.md +87 -3
download_l2cs_weights.py +37 -0
main.py +273 -26
models/L2CS-Net/.gitignore +140 -0
models/L2CS-Net/LICENSE +21 -0
models/L2CS-Net/README.md +148 -0
models/L2CS-Net/demo.py +87 -0
models/L2CS-Net/l2cs/__init__.py +21 -0
models/L2CS-Net/l2cs/datasets.py +157 -0
models/L2CS-Net/l2cs/model.py +73 -0
models/L2CS-Net/l2cs/pipeline.py +133 -0
models/L2CS-Net/l2cs/results.py +11 -0
models/L2CS-Net/l2cs/utils.py +145 -0
models/L2CS-Net/l2cs/vis.py +64 -0
models/L2CS-Net/leave_one_out_eval.py +54 -0
models/L2CS-Net/models/L2CSNet_gaze360.pkl +3 -0
models/L2CS-Net/models/README.md +1 -0
models/L2CS-Net/pyproject.toml +44 -0
models/L2CS-Net/test.py +284 -0
models/L2CS-Net/train.py +384 -0
models/gaze_calibration.py +146 -0
models/gaze_eye_fusion.py +66 -0
requirements.txt +4 -0
src/components/CalibrationOverlay.jsx +146 -0
src/components/FocusPageLocal.jsx +68 -2
src/utils/VideoManagerLocal.js +97 -3
ui/pipeline.py +148 -6

Dockerfile CHANGED Viewed

@@ -7,7 +7,14 @@ ENV PYTHONUNBUFFERED=1
 WORKDIR /app
-RUN apt-get update && apt-get install -y --no-install-recommends libglib2.0-0 libsm6 libxrender1 libxext6 libxcb1 libgl1 libgomp1 ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libavdevice-dev libopus-dev libvpx-dev libsrtp2-dev build-essential nodejs npm && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt

 WORKDIR /app
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libglib2.0-0 libsm6 libxrender1 libxext6 libxcb1 libgl1 libgomp1 \
+    ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswscale-dev \
+    libavdevice-dev libopus-dev libvpx-dev libsrtp2-dev \
+    build-essential nodejs npm git \
+    && rm -rf /var/lib/apt/lists/*
+RUN pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu
 COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt

README.md CHANGED Viewed

@@ -1,10 +1,94 @@
 ---
-title: IntegrationTest
-emoji: 📚
 colorFrom: indigo
 colorTo: purple
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: FocusGuard
 colorFrom: indigo
 colorTo: purple
 sdk: docker
 pinned: false
 ---
+# FocusGuard - Real-Time Focus Detection
+A web app that monitors whether you're focused on your screen using your webcam. Combines head pose estimation, eye behaviour analysis, and deep learning gaze tracking to detect attention in real time.
+## How It Works
+1. **Open the app** and click **Start** - your webcam feed appears with a face mesh overlay.
+2. **Pick a model** from the selector bar (Geometric, XGBoost, L2CS, etc.).
+3. The system analyses each frame and shows **FOCUSED** or **NOT FOCUSED** with a confidence score.
+4. A timeline tracks your focus over time. Session history is saved for review.
+## Models
+| Model | What it uses | Best for |
+|-------|-------------|----------|
+| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
+| **XGBoost** | Trained classifier on head/eye features | Balanced accuracy/speed |
+| **MLP** | Neural network on same features | Higher accuracy |
+| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
+| **L2CS** | Deep gaze estimation (ResNet50) | Detects eye-only gaze shifts |
+## L2CS Gaze Tracking
+L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.
+### Standalone mode
+Select **L2CS** as the model - it handles everything.
+### Boost mode
+Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:
+- Base model handles head pose and eye openness (35% weight)
+- L2CS handles gaze direction (65% weight)
+- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score
+### Calibration
+After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:
+1. A fullscreen overlay shows 9 target dots (3x3 grid)
+2. Look at each dot as the progress ring fills
+3. The first dot (centre) sets your baseline gaze offset
+4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
+5. A cyan tracking dot appears on the video showing where you're looking
+## Tech Stack
+- **Backend**: FastAPI + WebSocket, Python 3.10
+- **Frontend**: React + Vite
+- **Face detection**: MediaPipe Face Landmarker (478 landmarks)
+- **Gaze estimation**: L2CS-Net (ResNet50, Gaze360 weights)
+- **ML models**: XGBoost, PyTorch MLP
+- **Deployment**: Docker on Hugging Face Spaces
+## Running Locally
+```bash
+# install Python deps
+pip install -r requirements.txt
+# install frontend deps and build
+npm install && npm run build
+# start the server
+uvicorn main:app --port 8000
+```
+Open `http://localhost:8000` in your browser.
+## Project Structure
+```
+main.py                     # FastAPI app, WebSocket handler, API endpoints
+ui/pipeline.py              # All focus detection pipelines (Geometric, MLP, XGBoost, Hybrid, L2CS)
+models/
+  face_mesh.py              # MediaPipe face landmark detector
+  head_pose.py              # Head pose estimation from landmarks
+  eye_scorer.py             # EAR/eye behaviour scoring
+  gaze_calibration.py       # 9-point polynomial gaze calibration
+  gaze_eye_fusion.py        # Fuses calibrated gaze with eye openness
+  L2CS-Net/                 # In-tree L2CS-Net repo with Gaze360 weights
+src/
+  components/
+    FocusPageLocal.jsx      # Main focus page (camera, controls, model selector)
+    CalibrationOverlay.jsx  # Fullscreen calibration UI
+  utils/
+    VideoManagerLocal.js    # WebSocket client, frame capture, canvas rendering
+Dockerfile                  # Docker build for HF Spaces
+```

download_l2cs_weights.py ADDED Viewed

	@@ -0,0 +1,37 @@

+#!/usr/bin/env python3
+# Downloads L2CS-Net Gaze360 weights into checkpoints/
+import os
+import sys
+CHECKPOINTS_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "checkpoints")
+DEST = os.path.join(CHECKPOINTS_DIR, "L2CSNet_gaze360.pkl")
+GDRIVE_ID = "1dL2Jokb19_SBSHAhKHOxJsmYs5-GoyLo"
+def main():
+    if os.path.isfile(DEST):
+        print(f"[OK] Weights already at {DEST}")
+        return
+    try:
+        import gdown
+    except ImportError:
+        print("gdown not installed. Run: pip install gdown")
+        sys.exit(1)
+    os.makedirs(CHECKPOINTS_DIR, exist_ok=True)
+    print(f"Downloading L2CS-Net weights to {DEST} ...")
+    gdown.download(f"https://drive.google.com/uc?id={GDRIVE_ID}", DEST, quiet=False)
+    if os.path.isfile(DEST):
+        print(f"[OK] Downloaded ({os.path.getsize(DEST) / 1024 / 1024:.1f} MB)")
+    else:
+        print("[ERR] Download failed. Manual download:")
+        print("  https://drive.google.com/drive/folders/17p6ORr-JQJcw-eYtG2WGNiuS_qVKwdWd")
+        print(f"  Place L2CSNet_gaze360.pkl in {CHECKPOINTS_DIR}/")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

main.py CHANGED Viewed

@@ -22,7 +22,10 @@ from aiortc import RTCPeerConnection, RTCSessionDescription, VideoStreamTrack
 from av import VideoFrame
 from mediapipe.tasks.python.vision import FaceLandmarksConnections
-from ui.pipeline import FaceMeshPipeline, MLPPipeline, HybridFocusPipeline, XGBoostPipeline
 from models.face_mesh import FaceMeshDetector
 # ================ FACE MESH DRAWING (server-side, for WebRTC) ================
@@ -164,6 +167,7 @@ app.add_middleware(
 db_path = "focus_guard.db"
 pcs = set()
 _cached_model_name = "mlp"  # in-memory cache, updated via /api/settings
 async def _wait_for_ice_gathering(pc: RTCPeerConnection):
     if pc.iceGatheringState == "complete":
@@ -243,6 +247,7 @@ class SettingsUpdate(BaseModel):
     notification_threshold: Optional[int] = None
     frame_rate: Optional[int] = None
     model_name: Optional[str] = None
 class VideoTransformTrack(VideoStreamTrack):
     def __init__(self, track, session_id: int, get_channel: Callable[[], Any]):
@@ -270,6 +275,8 @@ class VideoTransformTrack(VideoStreamTrack):
             self.last_inference_time = now
             model_name = _cached_model_name
             if model_name not in pipelines or pipelines.get(model_name) is None:
                 model_name = 'mlp'
             active_pipeline = pipelines.get(model_name)
@@ -455,6 +462,7 @@ pipelines = {
     "mlp": None,
     "hybrid": None,
     "xgboost": None,
 }
 # Thread pool for CPU-bound inference so the event loop stays responsive.
@@ -464,14 +472,81 @@ _inference_executor = concurrent.futures.ThreadPoolExecutor(
 )
 # One lock per pipeline so shared state (TemporalTracker, etc.) is not corrupted when
 # multiple frames are processed in parallel by the thread pool.
-_pipeline_locks = {name: threading.Lock() for name in ("geometric", "mlp", "hybrid", "xgboost")}
-def _process_frame_safe(pipeline, frame, model_name: str):
-    """Run process_frame in executor with per-pipeline lock."""
     with _pipeline_locks[model_name]:
         return pipeline.process_frame(frame)
 @app.on_event("startup")
 async def startup_event():
     global pipelines, _cached_model_name
@@ -509,6 +584,11 @@ async def startup_event():
     except Exception as e:
         print(f"[ERR] Failed to load XGBoostPipeline: {e}")
 @app.on_event("shutdown")
 async def shutdown_event():
     _inference_executor.shutdown(wait=False)
@@ -579,14 +659,19 @@ async def webrtc_offer(offer: dict):
 @app.websocket("/ws/video")
 async def websocket_endpoint(websocket: WebSocket):
     await websocket.accept()
     session_id = None
     frame_count = 0
     running = True
     event_buffer = _EventBuffer(flush_interval=2.0)
     # Latest frame slot — only the most recent frame is kept, older ones are dropped.
-    # Using a dict so nested functions can mutate without nonlocal issues.
     _slot = {"frame": None}
     _frame_ready = asyncio.Event()
@@ -617,7 +702,6 @@ async def websocket_endpoint(websocket: WebSocket):
                 data = json.loads(text)
                 if data["type"] == "frame":
-                    # Legacy base64 path (fallback)
                     _slot["frame"] = base64.b64decode(data["image"])
                     _frame_ready.set()
@@ -636,6 +720,47 @@ async def websocket_endpoint(websocket: WebSocket):
                         if summary:
                             await websocket.send_json({"type": "session_ended", "summary": summary})
                         session_id = None
         except WebSocketDisconnect:
             running = False
             _frame_ready.set()
@@ -654,7 +779,6 @@ async def websocket_endpoint(websocket: WebSocket):
             if not running:
                 return
-            # Grab latest frame and clear slot
             raw = _slot["frame"]
             _slot["frame"] = None
             if raw is None:
@@ -667,38 +791,87 @@ async def websocket_endpoint(websocket: WebSocket):
                     continue
                 frame = cv2.resize(frame, (640, 480))
-                model_name = _cached_model_name
                 if model_name not in pipelines or pipelines.get(model_name) is None:
                     model_name = "mlp"
                 active_pipeline = pipelines.get(model_name)
                 landmarks_list = None
                 if active_pipeline is not None:
-                    out = await loop.run_in_executor(
-                        _inference_executor,
-                        _process_frame_safe,
-                        active_pipeline,
-                        frame,
-                        model_name,
-                    )
                     is_focused = out["is_focused"]
                     confidence = out.get("mlp_prob", out.get("raw_score", 0.0))
                     lm = out.get("landmarks")
                     if lm is not None:
-                        # Send all 478 landmarks as flat array for tessellation drawing
                         landmarks_list = [
                             [round(float(lm[i, 0]), 3), round(float(lm[i, 1]), 3)]
                             for i in range(lm.shape[0])
                         ]
                     if session_id:
-                        event_buffer.add(session_id, is_focused, confidence, {
                             "s_face": out.get("s_face", 0.0),
                             "s_eye": out.get("s_eye", 0.0),
                             "mar": out.get("mar", 0.0),
                             "model": model_name,
-                        })
                 else:
                     is_focused = False
                     confidence = 0.0
@@ -710,8 +883,7 @@ async def websocket_endpoint(websocket: WebSocket):
                     "model": model_name,
                     "fc": frame_count,
                 }
-                if active_pipeline is not None:
-                    # Send detailed metrics for HUD
                     if out.get("yaw") is not None:
                         resp["yaw"] = round(out["yaw"], 1)
                         resp["pitch"] = round(out["pitch"], 1)
@@ -720,6 +892,24 @@ async def websocket_endpoint(websocket: WebSocket):
                         resp["mar"] = round(out["mar"], 3)
                     resp["sf"] = round(out.get("s_face", 0), 3)
                     resp["se"] = round(out.get("s_eye", 0), 3)
                 if landmarks_list is not None:
                     resp["lm"] = landmarks_list
                 await websocket.send_json(resp)
@@ -852,8 +1042,9 @@ async def get_settings():
         db.row_factory = aiosqlite.Row
         cursor = await db.execute("SELECT * FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
-        if row: return dict(row)
-        else: return {'sensitivity': 6, 'notification_enabled': True, 'notification_threshold': 30, 'frame_rate': 30, 'model_name': 'mlp'}
 @app.put("/api/settings")
 async def update_settings(settings: SettingsUpdate):
@@ -878,12 +1069,28 @@ async def update_settings(settings: SettingsUpdate):
         if settings.frame_rate is not None:
             updates.append("frame_rate = ?")
             params.append(max(5, min(60, settings.frame_rate)))
-        if settings.model_name is not None and settings.model_name in pipelines and pipelines[settings.model_name] is not None:
             updates.append("model_name = ?")
             params.append(settings.model_name)
             global _cached_model_name
             _cached_model_name = settings.model_name
         if updates:
             query = f"UPDATE user_settings SET {', '.join(updates)} WHERE id = 1"
             await db.execute(query, params)
@@ -919,15 +1126,55 @@ async def get_stats_summary():
 @app.get("/api/models")
 async def get_available_models():
-    """Return list of loaded model names and which is currently active."""
-    available = [name for name, p in pipelines.items() if p is not None]
     async with aiosqlite.connect(db_path) as db:
         cursor = await db.execute("SELECT model_name FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
         current = row[0] if row else "mlp"
         if current not in available and available:
             current = available[0]
-    return {"available": available, "current": current}
 @app.get("/api/mesh-topology")
 async def get_mesh_topology():

 from av import VideoFrame
 from mediapipe.tasks.python.vision import FaceLandmarksConnections
+from ui.pipeline import (
+    FaceMeshPipeline, MLPPipeline, HybridFocusPipeline, XGBoostPipeline,
+    L2CSPipeline, is_l2cs_weights_available,
+)
 from models.face_mesh import FaceMeshDetector
 # ================ FACE MESH DRAWING (server-side, for WebRTC) ================
 db_path = "focus_guard.db"
 pcs = set()
 _cached_model_name = "mlp"  # in-memory cache, updated via /api/settings
+_l2cs_boost_enabled = False  # when True, L2CS runs alongside the base model
 async def _wait_for_ice_gathering(pc: RTCPeerConnection):
     if pc.iceGatheringState == "complete":
     notification_threshold: Optional[int] = None
     frame_rate: Optional[int] = None
     model_name: Optional[str] = None
+    l2cs_boost: Optional[bool] = None
 class VideoTransformTrack(VideoStreamTrack):
     def __init__(self, track, session_id: int, get_channel: Callable[[], Any]):
             self.last_inference_time = now
             model_name = _cached_model_name
+            if model_name == "l2cs" and pipelines.get("l2cs") is None:
+                _ensure_l2cs()
             if model_name not in pipelines or pipelines.get(model_name) is None:
                 model_name = 'mlp'
             active_pipeline = pipelines.get(model_name)
     "mlp": None,
     "hybrid": None,
     "xgboost": None,
+    "l2cs": None,
 }
 # Thread pool for CPU-bound inference so the event loop stays responsive.
 )
 # One lock per pipeline so shared state (TemporalTracker, etc.) is not corrupted when
 # multiple frames are processed in parallel by the thread pool.
+_pipeline_locks = {name: threading.Lock() for name in ("geometric", "mlp", "hybrid", "xgboost", "l2cs")}
+_l2cs_load_lock = threading.Lock()
+_l2cs_error: str | None = None
+def _ensure_l2cs():
+    # lazy-load L2CS on first use, double-checked locking
+    global _l2cs_error
+    if pipelines["l2cs"] is not None:
+        return True
+    with _l2cs_load_lock:
+        if pipelines["l2cs"] is not None:
+            return True
+        if not is_l2cs_weights_available():
+            _l2cs_error = "Weights not found"
+            return False
+        try:
+            pipelines["l2cs"] = L2CSPipeline()
+            _l2cs_error = None
+            print("[OK] L2CSPipeline lazy-loaded")
+            return True
+        except Exception as e:
+            _l2cs_error = str(e)
+            print(f"[ERR] L2CS lazy-load failed: {e}")
+            return False
+def _process_frame_safe(pipeline, frame, model_name):
     with _pipeline_locks[model_name]:
         return pipeline.process_frame(frame)
+_BOOST_BASE_W = 0.35
+_BOOST_L2CS_W = 0.65
+_BOOST_VETO = 0.38  # L2CS below this -> forced not-focused
+def _process_frame_with_l2cs_boost(base_pipeline, frame, base_model_name):
+    # run base model
+    with _pipeline_locks[base_model_name]:
+        base_out = base_pipeline.process_frame(frame)
+    l2cs_pipe = pipelines.get("l2cs")
+    if l2cs_pipe is None:
+        base_out["boost_active"] = False
+        return base_out
+    # run L2CS
+    with _pipeline_locks["l2cs"]:
+        l2cs_out = l2cs_pipe.process_frame(frame)
+    base_score = base_out.get("mlp_prob", base_out.get("raw_score", 0.0))
+    l2cs_score = l2cs_out.get("raw_score", 0.0)
+    # veto: gaze clearly off-screen overrides base model
+    if l2cs_score < _BOOST_VETO:
+        fused_score = l2cs_score * 0.8
+        is_focused = False
+    else:
+        fused_score = _BOOST_BASE_W * base_score + _BOOST_L2CS_W * l2cs_score
+        is_focused = fused_score >= 0.52
+    base_out["raw_score"] = fused_score
+    base_out["is_focused"] = is_focused
+    base_out["boost_active"] = True
+    base_out["base_score"] = round(base_score, 3)
+    base_out["l2cs_score"] = round(l2cs_score, 3)
+    if l2cs_out.get("gaze_yaw") is not None:
+        base_out["gaze_yaw"] = l2cs_out["gaze_yaw"]
+        base_out["gaze_pitch"] = l2cs_out["gaze_pitch"]
+    return base_out
 @app.on_event("startup")
 async def startup_event():
     global pipelines, _cached_model_name
     except Exception as e:
         print(f"[ERR] Failed to load XGBoostPipeline: {e}")
+    if is_l2cs_weights_available():
+        print("[OK] L2CS weights found — pipeline will be lazy-loaded on first use")
+    else:
+        print("[WARN] L2CS weights not found — l2cs model unavailable")
 @app.on_event("shutdown")
 async def shutdown_event():
     _inference_executor.shutdown(wait=False)
 @app.websocket("/ws/video")
 async def websocket_endpoint(websocket: WebSocket):
+    from models.gaze_calibration import GazeCalibration
+    from models.gaze_eye_fusion import GazeEyeFusion
     await websocket.accept()
     session_id = None
     frame_count = 0
     running = True
     event_buffer = _EventBuffer(flush_interval=2.0)
+    # Calibration state (per-connection)
+    _cal: dict = {"cal": None, "collecting": False, "fusion": None}
     # Latest frame slot — only the most recent frame is kept, older ones are dropped.
     _slot = {"frame": None}
     _frame_ready = asyncio.Event()
                 data = json.loads(text)
                 if data["type"] == "frame":
                     _slot["frame"] = base64.b64decode(data["image"])
                     _frame_ready.set()
                         if summary:
                             await websocket.send_json({"type": "session_ended", "summary": summary})
                         session_id = None
+                # ---- Calibration commands ----
+                elif data["type"] == "calibration_start":
+                    loop = asyncio.get_event_loop()
+                    await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                    _cal["cal"] = GazeCalibration()
+                    _cal["collecting"] = True
+                    _cal["fusion"] = None
+                    cal = _cal["cal"]
+                    await websocket.send_json({
+                        "type": "calibration_started",
+                        "num_points": cal.num_points,
+                        "target": list(cal.current_target),
+                        "index": cal.current_index,
+                    })
+                elif data["type"] == "calibration_next":
+                    cal = _cal.get("cal")
+                    if cal is not None:
+                        more = cal.advance()
+                        if more:
+                            await websocket.send_json({
+                                "type": "calibration_point",
+                                "target": list(cal.current_target),
+                                "index": cal.current_index,
+                            })
+                        else:
+                            _cal["collecting"] = False
+                            ok = cal.fit()
+                            if ok:
+                                _cal["fusion"] = GazeEyeFusion(cal)
+                                await websocket.send_json({"type": "calibration_done", "success": True})
+                            else:
+                                await websocket.send_json({"type": "calibration_done", "success": False, "error": "Not enough samples"})
+                elif data["type"] == "calibration_cancel":
+                    _cal["cal"] = None
+                    _cal["collecting"] = False
+                    _cal["fusion"] = None
+                    await websocket.send_json({"type": "calibration_cancelled"})
         except WebSocketDisconnect:
             running = False
             _frame_ready.set()
             if not running:
                 return
             raw = _slot["frame"]
             _slot["frame"] = None
             if raw is None:
                     continue
                 frame = cv2.resize(frame, (640, 480))
+                # During calibration collection, always use L2CS
+                collecting = _cal.get("collecting", False)
+                if collecting:
+                    if pipelines.get("l2cs") is None:
+                        await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                    use_model = "l2cs" if pipelines.get("l2cs") is not None else _cached_model_name
+                else:
+                    use_model = _cached_model_name
+                model_name = use_model
+                if model_name == "l2cs" and pipelines.get("l2cs") is None:
+                    await loop.run_in_executor(_inference_executor, _ensure_l2cs)
                 if model_name not in pipelines or pipelines.get(model_name) is None:
                     model_name = "mlp"
                 active_pipeline = pipelines.get(model_name)
+                # L2CS boost: run L2CS alongside base model
+                use_boost = (
+                    _l2cs_boost_enabled
+                    and model_name != "l2cs"
+                    and pipelines.get("l2cs") is not None
+                    and not collecting
+                )
                 landmarks_list = None
+                out = None
                 if active_pipeline is not None:
+                    if use_boost:
+                        out = await loop.run_in_executor(
+                            _inference_executor,
+                            _process_frame_with_l2cs_boost,
+                            active_pipeline,
+                            frame,
+                            model_name,
+                        )
+                    else:
+                        out = await loop.run_in_executor(
+                            _inference_executor,
+                            _process_frame_safe,
+                            active_pipeline,
+                            frame,
+                            model_name,
+                        )
                     is_focused = out["is_focused"]
                     confidence = out.get("mlp_prob", out.get("raw_score", 0.0))
                     lm = out.get("landmarks")
                     if lm is not None:
                         landmarks_list = [
                             [round(float(lm[i, 0]), 3), round(float(lm[i, 1]), 3)]
                             for i in range(lm.shape[0])
                         ]
+                    # Calibration sample collection (L2CS gaze angles)
+                    if collecting and _cal.get("cal") is not None:
+                        pipe_yaw = out.get("gaze_yaw")
+                        pipe_pitch = out.get("gaze_pitch")
+                        if pipe_yaw is not None and pipe_pitch is not None:
+                            _cal["cal"].collect_sample(pipe_yaw, pipe_pitch)
+                    # Gaze fusion (when L2CS active + calibration fitted)
+                    fusion = _cal.get("fusion")
+                    if (
+                        fusion is not None
+                        and model_name == "l2cs"
+                        and out.get("gaze_yaw") is not None
+                    ):
+                        fuse = fusion.update(
+                            out["gaze_yaw"], out["gaze_pitch"], lm
+                        )
+                        is_focused = fuse["focused"]
+                        confidence = fuse["focus_score"]
                     if session_id:
+                        metadata = {
                             "s_face": out.get("s_face", 0.0),
                             "s_eye": out.get("s_eye", 0.0),
                             "mar": out.get("mar", 0.0),
                             "model": model_name,
+                        }
+                        event_buffer.add(session_id, is_focused, confidence, metadata)
                 else:
                     is_focused = False
                     confidence = 0.0
                     "model": model_name,
                     "fc": frame_count,
                 }
+                if out is not None:
                     if out.get("yaw") is not None:
                         resp["yaw"] = round(out["yaw"], 1)
                         resp["pitch"] = round(out["pitch"], 1)
                         resp["mar"] = round(out["mar"], 3)
                     resp["sf"] = round(out.get("s_face", 0), 3)
                     resp["se"] = round(out.get("s_eye", 0), 3)
+                    # Gaze fusion fields (L2CS standalone or boost mode)
+                    fusion = _cal.get("fusion")
+                    has_gaze = out.get("gaze_yaw") is not None
+                    if fusion is not None and has_gaze and (model_name == "l2cs" or use_boost):
+                        fuse = fusion.update(out["gaze_yaw"], out["gaze_pitch"], out.get("landmarks"))
+                        resp["gaze_x"] = fuse["gaze_x"]
+                        resp["gaze_y"] = fuse["gaze_y"]
+                        resp["on_screen"] = fuse["on_screen"]
+                        if model_name == "l2cs":
+                            resp["focused"] = fuse["focused"]
+                            resp["confidence"] = round(fuse["focus_score"], 3)
+                    if out.get("boost_active"):
+                        resp["boost"] = True
+                        resp["base_score"] = out.get("base_score", 0)
+                        resp["l2cs_score"] = out.get("l2cs_score", 0)
                 if landmarks_list is not None:
                     resp["lm"] = landmarks_list
                 await websocket.send_json(resp)
         db.row_factory = aiosqlite.Row
         cursor = await db.execute("SELECT * FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
+        result = dict(row) if row else {'sensitivity': 6, 'notification_enabled': True, 'notification_threshold': 30, 'frame_rate': 30, 'model_name': 'mlp'}
+        result['l2cs_boost'] = _l2cs_boost_enabled
+        return result
 @app.put("/api/settings")
 async def update_settings(settings: SettingsUpdate):
         if settings.frame_rate is not None:
             updates.append("frame_rate = ?")
             params.append(max(5, min(60, settings.frame_rate)))
+        if settings.model_name is not None and settings.model_name in pipelines:
+            if settings.model_name == "l2cs":
+                loop = asyncio.get_event_loop()
+                loaded = await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                if not loaded:
+                    raise HTTPException(status_code=400, detail=f"L2CS model unavailable: {_l2cs_error}")
+            elif pipelines[settings.model_name] is None:
+                raise HTTPException(status_code=400, detail=f"Model '{settings.model_name}' not loaded")
             updates.append("model_name = ?")
             params.append(settings.model_name)
             global _cached_model_name
             _cached_model_name = settings.model_name
+        if settings.l2cs_boost is not None:
+            global _l2cs_boost_enabled
+            if settings.l2cs_boost:
+                loop = asyncio.get_event_loop()
+                loaded = await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                if not loaded:
+                    raise HTTPException(status_code=400, detail=f"L2CS boost unavailable: {_l2cs_error}")
+            _l2cs_boost_enabled = settings.l2cs_boost
         if updates:
             query = f"UPDATE user_settings SET {', '.join(updates)} WHERE id = 1"
             await db.execute(query, params)
 @app.get("/api/models")
 async def get_available_models():
+    """Return model names, statuses, and which is currently active."""
+    statuses = {}
+    errors = {}
+    available = []
+    for name, p in pipelines.items():
+        if name == "l2cs":
+            if p is not None:
+                statuses[name] = "ready"
+                available.append(name)
+            elif is_l2cs_weights_available():
+                statuses[name] = "lazy"
+                available.append(name)
+            elif _l2cs_error:
+                statuses[name] = "error"
+                errors[name] = _l2cs_error
+            else:
+                statuses[name] = "unavailable"
+        elif p is not None:
+            statuses[name] = "ready"
+            available.append(name)
+        else:
+            statuses[name] = "unavailable"
     async with aiosqlite.connect(db_path) as db:
         cursor = await db.execute("SELECT model_name FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
         current = row[0] if row else "mlp"
         if current not in available and available:
             current = available[0]
+    l2cs_boost_available = (
+        statuses.get("l2cs") in ("ready", "lazy") and current != "l2cs"
+    )
+    return {
+        "available": available,
+        "current": current,
+        "statuses": statuses,
+        "errors": errors,
+        "l2cs_boost": _l2cs_boost_enabled,
+        "l2cs_boost_available": l2cs_boost_available,
+    }
+@app.get("/api/l2cs/status")
+async def l2cs_status():
+    """L2CS-specific status: weights available, loaded, and calibration info."""
+    loaded = pipelines.get("l2cs") is not None
+    return {
+        "weights_available": is_l2cs_weights_available(),
+        "loaded": loaded,
+        "error": _l2cs_error,
+    }
 @app.get("/api/mesh-topology")
 async def get_mesh_topology():

models/L2CS-Net/.gitignore ADDED Viewed

	@@ -0,0 +1,140 @@

+# Ignore the test data - sensitive
+datasets/
+evaluation/
+output/
+# Ignore debugging configurations
+/.vscode
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# Ignore other files
+my.secrets

models/L2CS-Net/LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2022 Ahmed Abdelrahman
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

models/L2CS-Net/README.md ADDED Viewed

	@@ -0,0 +1,148 @@

+ <p align="center">
+  <img src="https://github.com/Ahmednull/Storage/blob/main/gaze.gif" alt="animated" />
+</p>
+___
+# L2CS-Net
+The official PyTorch implementation of L2CS-Net for gaze estimation and tracking.
+## Installation
+<img src="https://img.shields.io/badge/python%20-%2314354C.svg?&style=for-the-badge&logo=python&logoColor=white"/> <img src="https://img.shields.io/badge/PyTorch%20-%23EE4C2C.svg?&style=for-the-badge&logo=PyTorch&logoColor=white" />
+Install package with the following:
+```
+pip install git+https://github.com/Ahmednull/L2CS-Net.git@main
+```
+Or, you can git clone the repo and install with the following:
+```
+pip install [-e] .
+```
+Now you should be able to import the package with the following command:
+```
+$ python
+>>> import l2cs
+```
+## Usage
+Detect face and predict gaze from webcam
+```python
+from l2cs import Pipeline, render
+import cv2
+gaze_pipeline = Pipeline(
+    weights=CWD / 'models' / 'L2CSNet_gaze360.pkl',
+    arch='ResNet50',
+    device=torch.device('cpu') # or 'gpu'
+)
+cap = cv2.VideoCapture(cam)
+_, frame = cap.read()
+# Process frame and visualize
+results = gaze_pipeline.step(frame)
+frame = render(frame, results)
+```
+## Demo
+* Download the pre-trained models from [here](https://drive.google.com/drive/folders/17p6ORr-JQJcw-eYtG2WGNiuS_qVKwdWd?usp=sharing) and Store it to *models/*.
+*  Run:
+```
+ python demo.py \
+ --snapshot models/L2CSNet_gaze360.pkl \
+ --gpu 0 \
+ --cam 0 \
+```
+This means the demo will run using *L2CSNet_gaze360.pkl* pretrained model
+## Community Contributions
+- [Gaze Detection and Eye Tracking: A How-To Guide](https://blog.roboflow.com/gaze-direction-position/): Use L2CS-Net through a HTTP interface with the open source Roboflow Inference project.
+## MPIIGaze
+We provide the code for train and test MPIIGaze dataset with leave-one-person-out evaluation.
+### Prepare datasets
+* Download **MPIIFaceGaze dataset** from [here](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/gaze-based-human-computer-interaction/its-written-all-over-your-face-full-face-appearance-based-gaze-estimation).
+* Apply data preprocessing from [here](http://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/).
+* Store the dataset to *datasets/MPIIFaceGaze*.
+### Train
+```
+ python train.py \
+ --dataset mpiigaze \
+ --snapshot output/snapshots \
+ --gpu 0 \
+ --num_epochs 50 \
+ --batch_size 16 \
+ --lr 0.00001 \
+ --alpha 1 \
+```
+This means the code will perform leave-one-person-out training automatically and store the models to *output/snapshots*.
+### Test
+```
+ python test.py \
+ --dataset mpiigaze \
+ --snapshot output/snapshots/snapshot_folder \
+ --evalpath evaluation/L2CS-mpiigaze  \
+ --gpu 0 \
+```
+This means the code will perform leave-one-person-out testing automatically and store the results to *evaluation/L2CS-mpiigaze*.
+To get the average leave-one-person-out accuracy use:
+```
+ python leave_one_out_eval.py \
+ --evalpath evaluation/L2CS-mpiigaze  \
+ --respath evaluation/L2CS-mpiigaze  \
+```
+This means the code will take the evaluation path and outputs the leave-one-out gaze accuracy to the *evaluation/L2CS-mpiigaze*.
+## Gaze360
+We provide the code for train and test Gaze360 dataset with train-val-test evaluation.
+### Prepare datasets
+* Download **Gaze360 dataset** from [here](http://gaze360.csail.mit.edu/download.php).
+* Apply data preprocessing from [here](http://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/).
+* Store the dataset to *datasets/Gaze360*.
+### Train
+```
+ python train.py \
+ --dataset gaze360 \
+ --snapshot output/snapshots \
+ --gpu 0 \
+ --num_epochs 50 \
+ --batch_size 16 \
+ --lr 0.00001 \
+ --alpha 1 \
+```
+This means the code will perform training and store the models to *output/snapshots*.
+### Test
+```
+ python test.py \
+ --dataset gaze360 \
+ --snapshot output/snapshots/snapshot_folder \
+ --evalpath evaluation/L2CS-gaze360  \
+ --gpu 0 \
+```
+This means the code will perform testing on snapshot_folder and store the results to *evaluation/L2CS-gaze360*.

models/L2CS-Net/demo.py ADDED Viewed

	@@ -0,0 +1,87 @@

+import argparse
+import pathlib
+import numpy as np
+import cv2
+import time
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torchvision import transforms
+import torch.backends.cudnn as cudnn
+import torchvision
+from PIL import Image
+from PIL import Image, ImageOps
+from face_detection import RetinaFace
+from l2cs import select_device, draw_gaze, getArch, Pipeline, render
+CWD = pathlib.Path.cwd()
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(
+        description='Gaze evalution using model pretrained with L2CS-Net on Gaze360.')
+    parser.add_argument(
+        '--device',dest='device', help='Device to run model: cpu or gpu:0',
+        default="cpu", type=str)
+    parser.add_argument(
+        '--snapshot',dest='snapshot', help='Path of model snapshot.',
+        default='output/snapshots/L2CS-gaze360-_loader-180-4/_epoch_55.pkl', type=str)
+    parser.add_argument(
+        '--cam',dest='cam_id', help='Camera device id to use [0]',
+        default=0, type=int)
+    parser.add_argument(
+        '--arch',dest='arch',help='Network architecture, can be: ResNet18, ResNet34, ResNet50, ResNet101, ResNet152',
+        default='ResNet50', type=str)
+    args = parser.parse_args()
+    return args
+if __name__ == '__main__':
+    args = parse_args()
+    cudnn.enabled = True
+    arch=args.arch
+    cam = args.cam_id
+    # snapshot_path = args.snapshot
+    gaze_pipeline = Pipeline(
+        weights=CWD / 'models' / 'L2CSNet_gaze360.pkl',
+        arch='ResNet50',
+        device = select_device(args.device, batch_size=1)
+    )
+    cap = cv2.VideoCapture(cam)
+    # Check if the webcam is opened correctly
+    if not cap.isOpened():
+        raise IOError("Cannot open webcam")
+    with torch.no_grad():
+        while True:
+            # Get frame
+            success, frame = cap.read()
+            start_fps = time.time()
+            if not success:
+                print("Failed to obtain frame")
+                time.sleep(0.1)
+            # Process frame
+            results = gaze_pipeline.step(frame)
+            # Visualize output
+            frame = render(frame, results)
+            myFPS = 1.0 / (time.time() - start_fps)
+            cv2.putText(frame, 'FPS: {:.1f}'.format(myFPS), (10, 20),cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 255, 0), 1, cv2.LINE_AA)
+            cv2.imshow("Demo",frame)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+            success,frame = cap.read()

models/L2CS-Net/l2cs/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from .utils import select_device, natural_keys, gazeto3d, angular, getArch
+from .vis import draw_gaze, render
+from .model import L2CS
+from .pipeline import Pipeline
+from .datasets import Gaze360, Mpiigaze
+__all__ = [
+    # Classes
+    'L2CS',
+    'Pipeline',
+    'Gaze360',
+    'Mpiigaze',
+    # Utils
+    'render',
+    'select_device',
+    'draw_gaze',
+    'natural_keys',
+    'gazeto3d',
+    'angular',
+    'getArch'
+]

models/L2CS-Net/l2cs/datasets.py ADDED Viewed

	@@ -0,0 +1,157 @@

+import os
+import numpy as np
+import cv2
+import torch
+from torch.utils.data.dataset import Dataset
+from torchvision import transforms
+from PIL import Image, ImageFilter
+class Gaze360(Dataset):
+    def __init__(self, path, root, transform, angle, binwidth, train=True):
+        self.transform = transform
+        self.root = root
+        self.orig_list_len = 0
+        self.angle = angle
+        if train==False:
+          angle=90
+        self.binwidth=binwidth
+        self.lines = []
+        if isinstance(path, list):
+            for i in path:
+                with open(i) as f:
+                    print("here")
+                    line = f.readlines()
+                    line.pop(0)
+                    self.lines.extend(line)
+        else:
+            with open(path) as f:
+                lines = f.readlines()
+                lines.pop(0)
+                self.orig_list_len = len(lines)
+                for line in lines:
+                    gaze2d = line.strip().split(" ")[5]
+                    label = np.array(gaze2d.split(",")).astype("float")
+                    if abs((label[0]*180/np.pi)) <= angle and abs((label[1]*180/np.pi)) <= angle:
+                        self.lines.append(line)
+        print("{} items removed from dataset that have an angle > {}".format(self.orig_list_len-len(self.lines), angle))
+    def __len__(self):
+        return len(self.lines)
+    def __getitem__(self, idx):
+        line = self.lines[idx]
+        line = line.strip().split(" ")
+        face = line[0]
+        lefteye = line[1]
+        righteye = line[2]
+        name = line[3]
+        gaze2d = line[5]
+        label = np.array(gaze2d.split(",")).astype("float")
+        label = torch.from_numpy(label).type(torch.FloatTensor)
+        pitch = label[0]* 180 / np.pi
+        yaw = label[1]* 180 / np.pi
+        img = Image.open(os.path.join(self.root, face))
+        # fimg = cv2.imread(os.path.join(self.root, face))
+        # fimg = cv2.resize(fimg, (448, 448))/255.0
+        # fimg = fimg.transpose(2, 0, 1)
+        # img=torch.from_numpy(fimg).type(torch.FloatTensor)
+        if self.transform:
+            img = self.transform(img)
+        # Bin values
+        bins = np.array(range(-1*self.angle, self.angle, self.binwidth))
+        binned_pose = np.digitize([pitch, yaw], bins) - 1
+        labels = binned_pose
+        cont_labels = torch.FloatTensor([pitch, yaw])
+        return img, labels, cont_labels, name
+class Mpiigaze(Dataset):
+  def __init__(self, pathorg, root, transform, train, angle,fold=0):
+    self.transform = transform
+    self.root = root
+    self.orig_list_len = 0
+    self.lines = []
+    path=pathorg.copy()
+    if train==True:
+      path.pop(fold)
+    else:
+      path=path[fold]
+    if isinstance(path, list):
+        for i in path:
+            with open(i) as f:
+                lines = f.readlines()
+                lines.pop(0)
+                self.orig_list_len += len(lines)
+                for line in lines:
+                    gaze2d = line.strip().split(" ")[7]
+                    label = np.array(gaze2d.split(",")).astype("float")
+                    if abs((label[0]*180/np.pi)) <= angle and abs((label[1]*180/np.pi)) <= angle:
+                        self.lines.append(line)
+    else:
+      with open(path) as f:
+        lines = f.readlines()
+        lines.pop(0)
+        self.orig_list_len += len(lines)
+        for line in lines:
+            gaze2d = line.strip().split(" ")[7]
+            label = np.array(gaze2d.split(",")).astype("float")
+            if abs((label[0]*180/np.pi)) <= 42 and abs((label[1]*180/np.pi)) <= 42:
+                self.lines.append(line)
+    print("{} items removed from dataset that have an angle > {}".format(self.orig_list_len-len(self.lines),angle))
+  def __len__(self):
+    return len(self.lines)
+  def __getitem__(self, idx):
+    line = self.lines[idx]
+    line = line.strip().split(" ")
+    name = line[3]
+    gaze2d = line[7]
+    head2d = line[8]
+    lefteye = line[1]
+    righteye = line[2]
+    face = line[0]
+    label = np.array(gaze2d.split(",")).astype("float")
+    label = torch.from_numpy(label).type(torch.FloatTensor)
+    pitch = label[0]* 180 / np.pi
+    yaw = label[1]* 180 / np.pi
+    img = Image.open(os.path.join(self.root, face))
+    # fimg = cv2.imread(os.path.join(self.root, face))
+    # fimg = cv2.resize(fimg, (448, 448))/255.0
+    # fimg = fimg.transpose(2, 0, 1)
+    # img=torch.from_numpy(fimg).type(torch.FloatTensor)
+    if self.transform:
+        img = self.transform(img)
+    # Bin values
+    bins = np.array(range(-42, 42,3))
+    binned_pose = np.digitize([pitch, yaw], bins) - 1
+    labels = binned_pose
+    cont_labels = torch.FloatTensor([pitch, yaw])
+    return img, labels, cont_labels, name

models/L2CS-Net/l2cs/model.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+import math
+import torch.nn.functional as F
+class L2CS(nn.Module):
+    def __init__(self, block, layers, num_bins):
+        self.inplanes = 64
+        super(L2CS, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.relu = nn.ReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc_yaw_gaze = nn.Linear(512 * block.expansion, num_bins)
+        self.fc_pitch_gaze = nn.Linear(512 * block.expansion, num_bins)
+       # Vestigial layer from previous experiments
+        self.fc_finetune = nn.Linear(512 * block.expansion + 3, 3)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2. / n))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+    def _make_layer(self, block, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(self.inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(planes * block.expansion),
+            )
+        layers = []
+        layers.append(block(self.inplanes, planes, stride, downsample))
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes))
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+        x = self.avgpool(x)
+        x = x.view(x.size(0), -1)
+        # gaze
+        pre_yaw_gaze =  self.fc_yaw_gaze(x)
+        pre_pitch_gaze = self.fc_pitch_gaze(x)
+        return pre_yaw_gaze, pre_pitch_gaze

models/L2CS-Net/l2cs/pipeline.py ADDED Viewed

	@@ -0,0 +1,133 @@

+import pathlib
+from typing import Union
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+from dataclasses import dataclass
+from face_detection import RetinaFace
+from .utils import prep_input_numpy, getArch
+from .results import GazeResultContainer
+class Pipeline:
+    def __init__(
+        self,
+        weights: pathlib.Path,
+        arch: str,
+        device: str = 'cpu',
+        include_detector:bool = True,
+        confidence_threshold:float = 0.5
+        ):
+        # Save input parameters
+        self.weights = weights
+        self.include_detector = include_detector
+        self.device = device
+        self.confidence_threshold = confidence_threshold
+        # Create L2CS model
+        self.model = getArch(arch, 90)
+        self.model.load_state_dict(torch.load(self.weights, map_location=device))
+        self.model.to(self.device)
+        self.model.eval()
+        # Create RetinaFace if requested
+        if self.include_detector:
+            if device.type == 'cpu':
+                self.detector = RetinaFace()
+            else:
+                self.detector = RetinaFace(gpu_id=device.index)
+            self.softmax = nn.Softmax(dim=1)
+            self.idx_tensor = [idx for idx in range(90)]
+            self.idx_tensor = torch.FloatTensor(self.idx_tensor).to(self.device)
+    def step(self, frame: np.ndarray) -> GazeResultContainer:
+        # Creating containers
+        face_imgs = []
+        bboxes = []
+        landmarks = []
+        scores = []
+        if self.include_detector:
+            faces = self.detector(frame)
+            if faces is not None:
+                for box, landmark, score in faces:
+                    # Apply threshold
+                    if score < self.confidence_threshold:
+                        continue
+                    # Extract safe min and max of x,y
+                    x_min=int(box[0])
+                    if x_min < 0:
+                        x_min = 0
+                    y_min=int(box[1])
+                    if y_min < 0:
+                        y_min = 0
+                    x_max=int(box[2])
+                    y_max=int(box[3])
+                    # Crop image
+                    img = frame[y_min:y_max, x_min:x_max]
+                    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+                    img = cv2.resize(img, (224, 224))
+                    face_imgs.append(img)
+                    # Save data
+                    bboxes.append(box)
+                    landmarks.append(landmark)
+                    scores.append(score)
+                # Predict gaze
+                pitch, yaw = self.predict_gaze(np.stack(face_imgs))
+            else:
+                pitch = np.empty((0,1))
+                yaw = np.empty((0,1))
+        else:
+            pitch, yaw = self.predict_gaze(frame)
+        # Save data
+        results = GazeResultContainer(
+            pitch=pitch,
+            yaw=yaw,
+            bboxes=np.stack(bboxes),
+            landmarks=np.stack(landmarks),
+            scores=np.stack(scores)
+        )
+        return results
+    def predict_gaze(self, frame: Union[np.ndarray, torch.Tensor]):
+        # Prepare input
+        if isinstance(frame, np.ndarray):
+            img = prep_input_numpy(frame, self.device)
+        elif isinstance(frame, torch.Tensor):
+            img = frame
+        else:
+            raise RuntimeError("Invalid dtype for input")
+        # Predict
+        gaze_pitch, gaze_yaw = self.model(img)
+        pitch_predicted = self.softmax(gaze_pitch)
+        yaw_predicted = self.softmax(gaze_yaw)
+        # Get continuous predictions in degrees.
+        pitch_predicted = torch.sum(pitch_predicted.data * self.idx_tensor, dim=1) * 4 - 180
+        yaw_predicted = torch.sum(yaw_predicted.data * self.idx_tensor, dim=1) * 4 - 180
+        pitch_predicted= pitch_predicted.cpu().detach().numpy()* np.pi/180.0
+        yaw_predicted= yaw_predicted.cpu().detach().numpy()* np.pi/180.0
+        return pitch_predicted, yaw_predicted

models/L2CS-Net/l2cs/results.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from dataclasses import dataclass
+import numpy as np
+@dataclass
+class GazeResultContainer:
+    pitch: np.ndarray
+    yaw: np.ndarray
+    bboxes: np.ndarray
+    landmarks: np.ndarray
+    scores: np.ndarray

models/L2CS-Net/l2cs/utils.py ADDED Viewed

	@@ -0,0 +1,145 @@

+import sys
+import os
+import math
+from math import cos, sin
+from pathlib import Path
+import subprocess
+import re
+import numpy as np
+import torch
+import torch.nn as nn
+import scipy.io as sio
+import cv2
+import torchvision
+from torchvision import transforms
+from .model import L2CS
+transformations = transforms.Compose([
+    transforms.ToPILImage(),
+    transforms.Resize(448),
+    transforms.ToTensor(),
+    transforms.Normalize(
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]
+    )
+])
+def atoi(text):
+    return int(text) if text.isdigit() else text
+def natural_keys(text):
+    '''
+    alist.sort(key=natural_keys) sorts in human order
+    http://nedbatchelder.com/blog/200712/human_sorting.html
+    (See Toothy's implementation in the comments)
+    '''
+    return [ atoi(c) for c in re.split(r'(\d+)', text) ]
+def prep_input_numpy(img:np.ndarray, device:str):
+    """Preparing a Numpy Array as input to L2CS-Net."""
+    if len(img.shape) == 4:
+        imgs = []
+        for im in img:
+            imgs.append(transformations(im))
+        img = torch.stack(imgs)
+    else:
+        img = transformations(img)
+    img = img.to(device)
+    if len(img.shape) == 3:
+        img = img.unsqueeze(0)
+    return img
+def gazeto3d(gaze):
+    gaze_gt = np.zeros([3])
+    gaze_gt[0] = -np.cos(gaze[1]) * np.sin(gaze[0])
+    gaze_gt[1] = -np.sin(gaze[1])
+    gaze_gt[2] = -np.cos(gaze[1]) * np.cos(gaze[0])
+    return gaze_gt
+def angular(gaze, label):
+    total = np.sum(gaze * label)
+    return np.arccos(min(total/(np.linalg.norm(gaze)* np.linalg.norm(label)), 0.9999999))*180/np.pi
+def select_device(device='', batch_size=None):
+    # device = 'cpu' or '0' or '0,1,2,3'
+    s = f'YOLOv3 🚀 {git_describe() or date_modified()} torch {torch.__version__} '  # string
+    cpu = device.lower() == 'cpu'
+    if cpu:
+        os.environ['CUDA_VISIBLE_DEVICES'] = '-1'  # force torch.cuda.is_available() = False
+    elif device:  # non-cpu device requested
+        os.environ['CUDA_VISIBLE_DEVICES'] = device  # set environment variable
+        # assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {device} requested'  # check availability
+    cuda = not cpu and torch.cuda.is_available()
+    if cuda:
+        devices = device.split(',') if device else range(torch.cuda.device_count())  # i.e. 0,1,6,7
+        n = len(devices)  # device count
+        if n > 1 and batch_size:  # check batch_size is divisible by device_count
+            assert batch_size % n == 0, f'batch-size {batch_size} not multiple of GPU count {n}'
+        space = ' ' * len(s)
+        for i, d in enumerate(devices):
+            p = torch.cuda.get_device_properties(i)
+            s += f"{'' if i == 0 else space}CUDA:{d} ({p.name}, {p.total_memory / 1024 ** 2}MB)\n"  # bytes to MB
+    else:
+        s += 'CPU\n'
+    return torch.device('cuda:0' if cuda else 'cpu')
+def spherical2cartesial(x):
+    output = torch.zeros(x.size(0),3)
+    output[:,2] = -torch.cos(x[:,1])*torch.cos(x[:,0])
+    output[:,0] = torch.cos(x[:,1])*torch.sin(x[:,0])
+    output[:,1] = torch.sin(x[:,1])
+    return output
+def compute_angular_error(input,target):
+    input = spherical2cartesial(input)
+    target = spherical2cartesial(target)
+    input = input.view(-1,3,1)
+    target = target.view(-1,1,3)
+    output_dot = torch.bmm(target,input)
+    output_dot = output_dot.view(-1)
+    output_dot = torch.acos(output_dot)
+    output_dot = output_dot.data
+    output_dot = 180*torch.mean(output_dot)/math.pi
+    return output_dot
+def softmax_temperature(tensor, temperature):
+    result = torch.exp(tensor / temperature)
+    result = torch.div(result, torch.sum(result, 1).unsqueeze(1).expand_as(result))
+    return result
+def git_describe(path=Path(__file__).parent):  # path must be a directory
+    # return human-readable git description, i.e. v5.0-5-g3e25f1e https://git-scm.com/docs/git-describe
+    s = f'git -C {path} describe --tags --long --always'
+    try:
+        return subprocess.check_output(s, shell=True, stderr=subprocess.STDOUT).decode()[:-1]
+    except subprocess.CalledProcessError as e:
+        return ''  # not a git repository
+def getArch(arch,bins):
+    # Base network structure
+    if arch == 'ResNet18':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[2, 2,  2, 2], bins)
+    elif arch == 'ResNet34':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[3, 4,  6, 3], bins)
+    elif arch == 'ResNet101':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 4, 23, 3], bins)
+    elif arch == 'ResNet152':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 8, 36, 3], bins)
+    else:
+        if arch != 'ResNet50':
+            print('Invalid value for architecture is passed! '
+                'The default value of ResNet50 will be used instead!')
+        model = L2CS( torchvision.models.resnet.Bottleneck, [3, 4, 6,  3], bins)
+    return model

models/L2CS-Net/l2cs/vis.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import cv2
+import numpy as np
+from .results import GazeResultContainer
+def draw_gaze(a,b,c,d,image_in, pitchyaw, thickness=2, color=(255, 255, 0),sclae=2.0):
+    """Draw gaze angle on given image with a given eye positions."""
+    image_out = image_in
+    (h, w) = image_in.shape[:2]
+    length = c
+    pos = (int(a+c / 2.0), int(b+d / 2.0))
+    if len(image_out.shape) == 2 or image_out.shape[2] == 1:
+        image_out = cv2.cvtColor(image_out, cv2.COLOR_GRAY2BGR)
+    dx = -length * np.sin(pitchyaw[0]) * np.cos(pitchyaw[1])
+    dy = -length * np.sin(pitchyaw[1])
+    cv2.arrowedLine(image_out, tuple(np.round(pos).astype(np.int32)),
+                   tuple(np.round([pos[0] + dx, pos[1] + dy]).astype(int)), color,
+                   thickness, cv2.LINE_AA, tipLength=0.18)
+    return image_out
+def draw_bbox(frame: np.ndarray, bbox: np.ndarray):
+    x_min=int(bbox[0])
+    if x_min < 0:
+        x_min = 0
+    y_min=int(bbox[1])
+    if y_min < 0:
+        y_min = 0
+    x_max=int(bbox[2])
+    y_max=int(bbox[3])
+    cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0,255,0), 1)
+    return frame
+def render(frame: np.ndarray, results: GazeResultContainer):
+    # Draw bounding boxes
+    for bbox in results.bboxes:
+        frame = draw_bbox(frame, bbox)
+    # Draw Gaze
+    for i in range(results.pitch.shape[0]):
+        bbox = results.bboxes[i]
+        pitch = results.pitch[i]
+        yaw = results.yaw[i]
+        # Extract safe min and max of x,y
+        x_min=int(bbox[0])
+        if x_min < 0:
+            x_min = 0
+        y_min=int(bbox[1])
+        if y_min < 0:
+            y_min = 0
+        x_max=int(bbox[2])
+        y_max=int(bbox[3])
+        # Compute sizes
+        bbox_width = x_max - x_min
+        bbox_height = y_max - y_min
+        draw_gaze(x_min,y_min,bbox_width, bbox_height,frame,(pitch,yaw),color=(0,0,255))
+    return frame

models/L2CS-Net/leave_one_out_eval.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import os
+import argparse
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(
+        description='gaze estimation using binned loss function.')
+    parser.add_argument(
+        '--evalpath', dest='evalpath', help='path for evaluating gaze test.',
+        default="evaluation\L2CS-gaze360-_standard-10", type=str)
+    parser.add_argument(
+        '--respath', dest='respath', help='path for saving result.',
+        default="evaluation\L2CS-gaze360-_standard-10", type=str)
+if __name__ == '__main__':
+    args = parse_args()
+    evalpath =args.evalpath
+    respath=args.respath
+    if not os.path.exist(respath):
+            os.makedirs(respath)
+    with open(os.path.join(respath,"avg.log"), 'w') as outfile:
+        outfile.write("Average equal\n")
+        min=10.0
+        dirlist = os.listdir(evalpath)
+        dirlist.sort()
+        l=0.0
+        for j in range(50):
+            j=20
+            avg=0.0
+            h=j+3
+            for i in dirlist:
+                with open(evalpath+"/"+i+"/mpiigaze_binned.log") as myfile:
+                    x=list(myfile)[h]
+                    str1 = ""
+                    # traverse in the string
+                    for ele in x:
+                        str1 += ele
+                    split_string = str1.split("MAE:",1)[1]
+                    avg+=float(split_string)
+            avg=avg/15.0
+            if avg<min:
+                min=avg
+                l=j+1
+            outfile.write("epoch"+str(j+1)+"= "+str(avg)+"\n")
+        outfile.write("min angular error equal= "+str(min)+"at epoch= "+str(l)+"\n")
+    print(min)

models/L2CS-Net/models/L2CSNet_gaze360.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a7f3480d868dd48261e1d59f915b0ef0bb33ea12ea00938fb2168f212080665
+size 95849977

models/L2CS-Net/models/README.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Path to pre-trained models

models/L2CS-Net/pyproject.toml ADDED Viewed

	@@ -0,0 +1,44 @@

+[project]
+name = "l2cs"
+version = "0.0.1"
+description = "The official PyTorch implementation of L2CS-Net for gaze estimation and tracking"
+authors = [
+    {name = "Ahmed Abderlrahman"},
+    {name = "Thorsten Hempel"}
+]
+license = {file = "LICENSE.txt"}
+readme = "README.md"
+requires-python = ">3.6"
+keywords = ["gaze", "estimation", "eye-tracking", "deep-learning", "pytorch"]
+classifiers = [
+    "Programming Language :: Python :: 3"
+]
+dependencies = [
+    'matplotlib>=3.3.4',
+    'numpy>=1.19.5',
+    'opencv-python>=4.5.5',
+    'pandas>=1.1.5',
+    'Pillow>=8.4.0',
+    'scipy>=1.5.4',
+    'torch>=1.10.1',
+    'torchvision>=0.11.2',
+    'face_detection@git+https://github.com/elliottzheng/face-detection'
+]
+[project.urls]
+homepath = "https://github.com/Ahmednull/L2CS-Net"
+repository = "https://github.com/Ahmednull/L2CS-Net"
+[build-system]
+requires = ["setuptools", "wheel"]
+build-backend = "setuptools.build_meta"
+# https://setuptools.pypa.io/en/stable/userguide/datafiles.html
+[tool.setuptools]
+include-package-data = true
+[tool.setuptools.packages.find]
+where = ["."]

models/L2CS-Net/test.py ADDED Viewed

	@@ -0,0 +1,284 @@

+import os, argparse
+import numpy as np
+import matplotlib.pyplot as plt
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.utils.data import DataLoader
+from torchvision import transforms
+import torch.backends.cudnn as cudnn
+import torchvision
+from l2cs import select_device, natural_keys, gazeto3d, angular, getArch, L2CS, Gaze360, Mpiigaze
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(
+        description='Gaze estimation using L2CSNet .')
+     # Gaze360
+    parser.add_argument(
+        '--gaze360image_dir', dest='gaze360image_dir', help='Directory path for gaze images.',
+        default='datasets/Gaze360/Image', type=str)
+    parser.add_argument(
+        '--gaze360label_dir', dest='gaze360label_dir', help='Directory path for gaze labels.',
+        default='datasets/Gaze360/Label/test.label', type=str)
+    # mpiigaze
+    parser.add_argument(
+        '--gazeMpiimage_dir', dest='gazeMpiimage_dir', help='Directory path for gaze images.',
+        default='datasets/MPIIFaceGaze/Image', type=str)
+    parser.add_argument(
+        '--gazeMpiilabel_dir', dest='gazeMpiilabel_dir', help='Directory path for gaze labels.',
+        default='datasets/MPIIFaceGaze/Label', type=str)
+    # Important args -------------------------------------------------------------------------------------------------------
+    # ----------------------------------------------------------------------------------------------------------------------
+    parser.add_argument(
+        '--dataset', dest='dataset', help='gaze360, mpiigaze',
+        default= "gaze360", type=str)
+    parser.add_argument(
+        '--snapshot', dest='snapshot', help='Path to the folder contains models.',
+        default='output/snapshots/L2CS-gaze360-_loader-180-4-lr', type=str)
+    parser.add_argument(
+        '--evalpath', dest='evalpath', help='path for the output evaluating gaze test.',
+        default="evaluation/L2CS-gaze360-_loader-180-4-lr", type=str)
+    parser.add_argument(
+        '--gpu',dest='gpu_id', help='GPU device id to use [0]',
+        default="0", type=str)
+    parser.add_argument(
+        '--batch_size', dest='batch_size', help='Batch size.',
+        default=100, type=int)
+    parser.add_argument(
+        '--arch', dest='arch', help='Network architecture, can be: ResNet18, ResNet34, [ResNet50], ''ResNet101, ResNet152, Squeezenet_1_0, Squeezenet_1_1, MobileNetV2',
+        default='ResNet50', type=str)
+    # ---------------------------------------------------------------------------------------------------------------------
+    # Important args ------------------------------------------------------------------------------------------------------
+    args = parser.parse_args()
+    return args
+def getArch(arch,bins):
+    # Base network structure
+    if arch == 'ResNet18':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[2, 2,  2, 2], bins)
+    elif arch == 'ResNet34':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[3, 4,  6, 3], bins)
+    elif arch == 'ResNet101':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 4, 23, 3], bins)
+    elif arch == 'ResNet152':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 8, 36, 3], bins)
+    else:
+        if arch != 'ResNet50':
+            print('Invalid value for architecture is passed! '
+                'The default value of ResNet50 will be used instead!')
+        model = L2CS( torchvision.models.resnet.Bottleneck, [3, 4, 6,  3], bins)
+    return model
+if __name__ == '__main__':
+    args = parse_args()
+    cudnn.enabled = True
+    gpu = select_device(args.gpu_id, batch_size=args.batch_size)
+    batch_size=args.batch_size
+    arch=args.arch
+    data_set=args.dataset
+    evalpath =args.evalpath
+    snapshot_path = args.snapshot
+    bins=args.bins
+    angle=args.angle
+    bin_width=args.bin_width
+    transformations = transforms.Compose([
+        transforms.Resize(448),
+        transforms.ToTensor(),
+        transforms.Normalize(
+            mean=[0.485, 0.456, 0.406],
+            std=[0.229, 0.224, 0.225]
+        )
+    ])
+    if data_set=="gaze360":
+        gaze_dataset=Gaze360(args.gaze360label_dir,args.gaze360image_dir, transformations, 180, 4, train=False)
+        test_loader = torch.utils.data.DataLoader(
+            dataset=gaze_dataset,
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=4,
+            pin_memory=True)
+        if not os.path.exists(evalpath):
+            os.makedirs(evalpath)
+        # list all epochs for testing
+        folder = os.listdir(snapshot_path)
+        folder.sort(key=natural_keys)
+        softmax = nn.Softmax(dim=1)
+        with open(os.path.join(evalpath,data_set+".log"), 'w') as outfile:
+            configuration = f"\ntest configuration = gpu_id={gpu}, batch_size={batch_size}, model_arch={arch}\nStart testing dataset={data_set}----------------------------------------\n"
+            print(configuration)
+            outfile.write(configuration)
+            epoch_list=[]
+            avg_yaw=[]
+            avg_pitch=[]
+            avg_MAE=[]
+            for epochs in folder:
+                # Base network structure
+                model=getArch(arch, 90)
+                saved_state_dict = torch.load(os.path.join(snapshot_path, epochs))
+                model.load_state_dict(saved_state_dict)
+                model.cuda(gpu)
+                model.eval()
+                total = 0
+                idx_tensor = [idx for idx in range(90)]
+                idx_tensor = torch.FloatTensor(idx_tensor).cuda(gpu)
+                avg_error = .0
+                with torch.no_grad():
+                    for j, (images, labels, cont_labels, name) in enumerate(test_loader):
+                        images = Variable(images).cuda(gpu)
+                        total += cont_labels.size(0)
+                        label_pitch = cont_labels[:,0].float()*np.pi/180
+                        label_yaw = cont_labels[:,1].float()*np.pi/180
+                        gaze_pitch, gaze_yaw = model(images)
+                        # Binned predictions
+                        _, pitch_bpred = torch.max(gaze_pitch.data, 1)
+                        _, yaw_bpred = torch.max(gaze_yaw.data, 1)
+                        # Continuous predictions
+                        pitch_predicted = softmax(gaze_pitch)
+                        yaw_predicted = softmax(gaze_yaw)
+                        # mapping from binned (0 to 28) to angels (-180 to 180)
+                        pitch_predicted = torch.sum(pitch_predicted * idx_tensor, 1).cpu() * 4 - 180
+                        yaw_predicted = torch.sum(yaw_predicted * idx_tensor, 1).cpu() * 4 - 180
+                        pitch_predicted = pitch_predicted*np.pi/180
+                        yaw_predicted = yaw_predicted*np.pi/180
+                        for p,y,pl,yl in zip(pitch_predicted,yaw_predicted,label_pitch,label_yaw):
+                            avg_error += angular(gazeto3d([p,y]), gazeto3d([pl,yl]))
+                x = ''.join(filter(lambda i: i.isdigit(), epochs))
+                epoch_list.append(x)
+                avg_MAE.append(avg_error/total)
+                loger = f"[{epochs}---{args.dataset}] Total Num:{total},MAE:{avg_error/total}\n"
+                outfile.write(loger)
+                print(loger)
+        fig = plt.figure(figsize=(14, 8))
+        plt.xlabel('epoch')
+        plt.ylabel('avg')
+        plt.title('Gaze angular error')
+        plt.legend()
+        plt.plot(epoch_list, avg_MAE, color='k', label='mae')
+        fig.savefig(os.path.join(evalpath,data_set+".png"), format='png')
+        plt.show()
+    elif data_set=="mpiigaze":
+        model_used=getArch(arch, bins)
+        for fold in range(15):
+            folder = os.listdir(args.gazeMpiilabel_dir)
+            folder.sort()
+            testlabelpathombined = [os.path.join(args.gazeMpiilabel_dir, j) for j in folder]
+            gaze_dataset=Mpiigaze(testlabelpathombined,args.gazeMpiimage_dir, transformations, False, angle, fold)
+            test_loader = torch.utils.data.DataLoader(
+                dataset=gaze_dataset,
+                batch_size=batch_size,
+                shuffle=True,
+                num_workers=4,
+                pin_memory=True)
+            if not os.path.exists(os.path.join(evalpath, f"fold"+str(fold))):
+                os.makedirs(os.path.join(evalpath, f"fold"+str(fold)))
+            # list all epochs for testing
+            folder = os.listdir(os.path.join(snapshot_path,"fold"+str(fold)))
+            folder.sort(key=natural_keys)
+            softmax = nn.Softmax(dim=1)
+            with open(os.path.join(evalpath, os.path.join("fold"+str(fold), data_set+".log")), 'w') as outfile:
+                configuration = f"\ntest configuration equal gpu_id={gpu}, batch_size={batch_size}, model_arch={arch}\nStart testing dataset={data_set}, fold={fold}---------------------------------------\n"
+                print(configuration)
+                outfile.write(configuration)
+                epoch_list=[]
+                avg_MAE=[]
+                for epochs in folder:
+                    model=model_used
+                    saved_state_dict = torch.load(os.path.join(snapshot_path+"/fold"+str(fold),epochs))
+                    model= nn.DataParallel(model,device_ids=[0])
+                    model.load_state_dict(saved_state_dict)
+                    model.cuda(gpu)
+                    model.eval()
+                    total = 0
+                    idx_tensor = [idx for idx in range(28)]
+                    idx_tensor = torch.FloatTensor(idx_tensor).cuda(gpu)
+                    avg_error = .0
+                    with torch.no_grad():
+                        for j, (images, labels, cont_labels, name) in enumerate(test_loader):
+                            images = Variable(images).cuda(gpu)
+                            total += cont_labels.size(0)
+                            label_pitch = cont_labels[:,0].float()*np.pi/180
+                            label_yaw = cont_labels[:,1].float()*np.pi/180
+                            gaze_pitch, gaze_yaw = model(images)
+                            # Binned predictions
+                            _, pitch_bpred = torch.max(gaze_pitch.data, 1)
+                            _, yaw_bpred = torch.max(gaze_yaw.data, 1)
+                            # Continuous predictions
+                            pitch_predicted = softmax(gaze_pitch)
+                            yaw_predicted = softmax(gaze_yaw)
+                            # mapping from binned (0 to 28) to angels (-42 to 42)
+                            pitch_predicted = \
+                                torch.sum(pitch_predicted * idx_tensor, 1).cpu() * 3 - 42
+                            yaw_predicted = \
+                                torch.sum(yaw_predicted * idx_tensor, 1).cpu() * 3 - 42
+                            pitch_predicted = pitch_predicted*np.pi/180
+                            yaw_predicted = yaw_predicted*np.pi/180
+                            for p,y,pl,yl in zip(pitch_predicted, yaw_predicted, label_pitch, label_yaw):
+                                avg_error += angular(gazeto3d([p,y]), gazeto3d([pl,yl]))
+                    x = ''.join(filter(lambda i: i.isdigit(), epochs))
+                    epoch_list.append(x)
+                    avg_MAE.append(avg_error/ total)
+                    loger = f"[{epochs}---{args.dataset}] Total Num:{total},MAE:{avg_error/total} \n"
+                    outfile.write(loger)
+                    print(loger)
+            fig = plt.figure(figsize=(14, 8))
+            plt.xlabel('epoch')
+            plt.ylabel('avg')
+            plt.title('Gaze angular error')
+            plt.legend()
+            plt.plot(epoch_list, avg_MAE, color='k', label='mae')
+            fig.savefig(os.path.join(evalpath, os.path.join("fold"+str(fold), data_set+".png")), format='png')
+            # plt.show()

models/L2CS-Net/train.py ADDED Viewed

	@@ -0,0 +1,384 @@

+import os
+import argparse
+import time
+import torch.utils.model_zoo as model_zoo
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.utils.data import DataLoader
+from torchvision import transforms
+import torch.backends.cudnn as cudnn
+import torchvision
+from l2cs import L2CS, select_device, Gaze360, Mpiigaze
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(description='Gaze estimation using L2CSNet.')
+    # Gaze360
+    parser.add_argument(
+        '--gaze360image_dir', dest='gaze360image_dir', help='Directory path for gaze images.',
+        default='datasets/Gaze360/Image', type=str)
+    parser.add_argument(
+        '--gaze360label_dir', dest='gaze360label_dir', help='Directory path for gaze labels.',
+        default='datasets/Gaze360/Label/train.label', type=str)
+    # mpiigaze
+    parser.add_argument(
+        '--gazeMpiimage_dir', dest='gazeMpiimage_dir', help='Directory path for gaze images.',
+        default='datasets/MPIIFaceGaze/Image', type=str)
+    parser.add_argument(
+        '--gazeMpiilabel_dir', dest='gazeMpiilabel_dir', help='Directory path for gaze labels.',
+        default='datasets/MPIIFaceGaze/Label', type=str)
+    # Important args -------------------------------------------------------------------------------------------------------
+    # ----------------------------------------------------------------------------------------------------------------------
+    parser.add_argument(
+        '--dataset', dest='dataset', help='mpiigaze, rtgene, gaze360, ethgaze',
+        default= "gaze360", type=str)
+    parser.add_argument(
+        '--output', dest='output', help='Path of output models.',
+        default='output/snapshots/', type=str)
+    parser.add_argument(
+        '--snapshot', dest='snapshot', help='Path of model snapshot.',
+        default='', type=str)
+    parser.add_argument(
+        '--gpu', dest='gpu_id', help='GPU device id to use [0] or multiple 0,1,2,3',
+        default='0', type=str)
+    parser.add_argument(
+        '--num_epochs', dest='num_epochs', help='Maximum number of training epochs.',
+        default=60, type=int)
+    parser.add_argument(
+        '--batch_size', dest='batch_size', help='Batch size.',
+        default=1, type=int)
+    parser.add_argument(
+        '--arch', dest='arch', help='Network architecture, can be: ResNet18, ResNet34, [ResNet50], ''ResNet101, ResNet152, Squeezenet_1_0, Squeezenet_1_1, MobileNetV2',
+        default='ResNet50', type=str)
+    parser.add_argument(
+        '--alpha', dest='alpha', help='Regression loss coefficient.',
+        default=1, type=float)
+    parser.add_argument(
+        '--lr', dest='lr', help='Base learning rate.',
+        default=0.00001, type=float)
+    # ---------------------------------------------------------------------------------------------------------------------
+    # Important args ------------------------------------------------------------------------------------------------------
+    args = parser.parse_args()
+    return args
+def get_ignored_params(model):
+    # Generator function that yields ignored params.
+    b = [model.conv1, model.bn1, model.fc_finetune]
+    for i in range(len(b)):
+        for module_name, module in b[i].named_modules():
+            if 'bn' in module_name:
+                module.eval()
+            for name, param in module.named_parameters():
+                yield param
+def get_non_ignored_params(model):
+    # Generator function that yields params that will be optimized.
+    b = [model.layer1, model.layer2, model.layer3, model.layer4]
+    for i in range(len(b)):
+        for module_name, module in b[i].named_modules():
+            if 'bn' in module_name:
+                module.eval()
+            for name, param in module.named_parameters():
+                yield param
+def get_fc_params(model):
+    # Generator function that yields fc layer params.
+    b = [model.fc_yaw_gaze, model.fc_pitch_gaze]
+    for i in range(len(b)):
+        for module_name, module in b[i].named_modules():
+            for name, param in module.named_parameters():
+                yield param
+def load_filtered_state_dict(model, snapshot):
+    # By user apaszke from discuss.pytorch.org
+    model_dict = model.state_dict()
+    snapshot = {k: v for k, v in snapshot.items() if k in model_dict}
+    model_dict.update(snapshot)
+    model.load_state_dict(model_dict)
+def getArch_weights(arch, bins):
+    if arch == 'ResNet18':
+        model = L2CS(torchvision.models.resnet.BasicBlock, [2, 2, 2, 2], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
+    elif arch == 'ResNet34':
+        model = L2CS(torchvision.models.resnet.BasicBlock, [3, 4, 6, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet34-333f7ec4.pth'
+    elif arch == 'ResNet101':
+        model = L2CS(torchvision.models.resnet.Bottleneck, [3, 4, 23, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth'
+    elif arch == 'ResNet152':
+        model = L2CS(torchvision.models.resnet.Bottleneck,[3, 8, 36, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet152-b121ed2d.pth'
+    else:
+        if arch != 'ResNet50':
+            print('Invalid value for architecture is passed! '
+                  'The default value of ResNet50 will be used instead!')
+        model = L2CS(torchvision.models.resnet.Bottleneck, [3, 4, 6, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet50-19c8e357.pth'
+    return model, pre_url
+if __name__ == '__main__':
+    args = parse_args()
+    cudnn.enabled = True
+    num_epochs = args.num_epochs
+    batch_size = args.batch_size
+    gpu = select_device(args.gpu_id, batch_size=args.batch_size)
+    data_set=args.dataset
+    alpha = args.alpha
+    output=args.output
+    transformations = transforms.Compose([
+        transforms.Resize(448),
+        transforms.ToTensor(),
+        transforms.Normalize(
+            mean=[0.485, 0.456, 0.406],
+            std=[0.229, 0.224, 0.225]
+        )
+    ])
+    if data_set=="gaze360":
+        model, pre_url = getArch_weights(args.arch, 90)
+        if args.snapshot == '':
+            load_filtered_state_dict(model, model_zoo.load_url(pre_url))
+        else:
+            saved_state_dict = torch.load(args.snapshot)
+            model.load_state_dict(saved_state_dict)
+        model.cuda(gpu)
+        dataset=Gaze360(args.gaze360label_dir, args.gaze360image_dir, transformations, 180, 4)
+        print('Loading data.')
+        train_loader_gaze = DataLoader(
+            dataset=dataset,
+            batch_size=int(batch_size),
+            shuffle=True,
+            num_workers=0,
+            pin_memory=True)
+        torch.backends.cudnn.benchmark = True
+        summary_name = '{}_{}'.format('L2CS-gaze360-', int(time.time()))
+        output=os.path.join(output, summary_name)
+        if not os.path.exists(output):
+            os.makedirs(output)
+        criterion = nn.CrossEntropyLoss().cuda(gpu)
+        reg_criterion = nn.MSELoss().cuda(gpu)
+        softmax = nn.Softmax(dim=1).cuda(gpu)
+        idx_tensor = [idx for idx in range(90)]
+        idx_tensor = Variable(torch.FloatTensor(idx_tensor)).cuda(gpu)
+        # Optimizer gaze
+        optimizer_gaze = torch.optim.Adam([
+            {'params': get_ignored_params(model), 'lr': 0},
+            {'params': get_non_ignored_params(model), 'lr': args.lr},
+            {'params': get_fc_params(model), 'lr': args.lr}
+        ], args.lr)
+        configuration = f"\ntrain configuration, gpu_id={args.gpu_id}, batch_size={batch_size}, model_arch={args.arch}\nStart testing dataset={data_set}, loader={len(train_loader_gaze)}------------------------- \n"
+        print(configuration)
+        for epoch in range(num_epochs):
+            sum_loss_pitch_gaze = sum_loss_yaw_gaze = iter_gaze = 0
+            for i, (images_gaze, labels_gaze, cont_labels_gaze,name) in enumerate(train_loader_gaze):
+                images_gaze = Variable(images_gaze).cuda(gpu)
+                # Binned labels
+                label_pitch_gaze = Variable(labels_gaze[:, 0]).cuda(gpu)
+                label_yaw_gaze = Variable(labels_gaze[:, 1]).cuda(gpu)
+                # Continuous labels
+                label_pitch_cont_gaze = Variable(cont_labels_gaze[:, 0]).cuda(gpu)
+                label_yaw_cont_gaze = Variable(cont_labels_gaze[:, 1]).cuda(gpu)
+                pitch, yaw = model(images_gaze)
+                # Cross entropy loss
+                loss_pitch_gaze = criterion(pitch, label_pitch_gaze)
+                loss_yaw_gaze = criterion(yaw, label_yaw_gaze)
+                # MSE loss
+                pitch_predicted = softmax(pitch)
+                yaw_predicted = softmax(yaw)
+                pitch_predicted = \
+                    torch.sum(pitch_predicted * idx_tensor, 1) * 4 - 180
+                yaw_predicted = \
+                    torch.sum(yaw_predicted * idx_tensor, 1) * 4 - 180
+                loss_reg_pitch = reg_criterion(
+                    pitch_predicted, label_pitch_cont_gaze)
+                loss_reg_yaw = reg_criterion(
+                    yaw_predicted, label_yaw_cont_gaze)
+                # Total loss
+                loss_pitch_gaze += alpha * loss_reg_pitch
+                loss_yaw_gaze += alpha * loss_reg_yaw
+                sum_loss_pitch_gaze += loss_pitch_gaze
+                sum_loss_yaw_gaze += loss_yaw_gaze
+                loss_seq = [loss_pitch_gaze, loss_yaw_gaze]
+                grad_seq = [torch.tensor(1.0).cuda(gpu) for _ in range(len(loss_seq))]
+                optimizer_gaze.zero_grad(set_to_none=True)
+                torch.autograd.backward(loss_seq, grad_seq)
+                optimizer_gaze.step()
+                # scheduler.step()
+                iter_gaze += 1
+                if (i+1) % 100 == 0:
+                    print('Epoch [%d/%d], Iter [%d/%d] Losses: '
+                        'Gaze Yaw %.4f,Gaze Pitch %.4f' % (
+                            epoch+1,
+                            num_epochs,
+                            i+1,
+                            len(dataset)//batch_size,
+                            sum_loss_pitch_gaze/iter_gaze,
+                            sum_loss_yaw_gaze/iter_gaze
+                        )
+                        )
+            if epoch % 1 == 0 and epoch < num_epochs:
+                print('Taking snapshot...',
+                    torch.save(model.state_dict(),
+                                output +'/'+
+                                '_epoch_' + str(epoch+1) + '.pkl')
+                    )
+    elif data_set=="mpiigaze":
+        folder = os.listdir(args.gazeMpiilabel_dir)
+        folder.sort()
+        testlabelpathombined = [os.path.join(args.gazeMpiilabel_dir, j) for j in folder]
+        for fold in range(15):
+            model, pre_url = getArch_weights(args.arch, 28)
+            load_filtered_state_dict(model, model_zoo.load_url(pre_url))
+            model = nn.DataParallel(model)
+            model.to(gpu)
+            print('Loading data.')
+            dataset=Mpiigaze(testlabelpathombined,args.gazeMpiimage_dir, transformations, True, fold)
+            train_loader_gaze = DataLoader(
+                dataset=dataset,
+                batch_size=int(batch_size),
+                shuffle=True,
+                num_workers=4,
+                pin_memory=True)
+            torch.backends.cudnn.benchmark = True
+            summary_name = '{}_{}'.format('L2CS-mpiigaze', int(time.time()))
+            if not os.path.exists(os.path.join(output+'/{}'.format(summary_name),'fold' + str(fold))):
+                os.makedirs(os.path.join(output+'/{}'.format(summary_name),'fold' + str(fold)))
+            criterion = nn.CrossEntropyLoss().cuda(gpu)
+            reg_criterion = nn.MSELoss().cuda(gpu)
+            softmax = nn.Softmax(dim=1).cuda(gpu)
+            idx_tensor = [idx for idx in range(28)]
+            idx_tensor = Variable(torch.FloatTensor(idx_tensor)).cuda(gpu)
+            # Optimizer gaze
+            optimizer_gaze = torch.optim.Adam([
+                {'params': get_ignored_params(model, args.arch), 'lr': 0},
+                {'params': get_non_ignored_params(model, args.arch), 'lr': args.lr},
+                {'params': get_fc_params(model, args.arch), 'lr': args.lr}
+            ], args.lr)
+            configuration = f"\ntrain configuration, gpu_id={args.gpu_id}, batch_size={batch_size}, model_arch={args.arch}\n Start training dataset={data_set}, loader={len(train_loader_gaze)}, fold={fold}--------------\n"
+            print(configuration)
+            for epoch in range(num_epochs):
+                sum_loss_pitch_gaze = sum_loss_yaw_gaze = iter_gaze = 0
+                for i, (images_gaze, labels_gaze, cont_labels_gaze,name) in enumerate(train_loader_gaze):
+                    images_gaze = Variable(images_gaze).cuda(gpu)
+                    # Binned labels
+                    label_pitch_gaze = Variable(labels_gaze[:, 0]).cuda(gpu)
+                    label_yaw_gaze = Variable(labels_gaze[:, 1]).cuda(gpu)
+                    # Continuous labels
+                    label_pitch_cont_gaze = Variable(cont_labels_gaze[:, 0]).cuda(gpu)
+                    label_yaw_cont_gaze = Variable(cont_labels_gaze[:, 1]).cuda(gpu)
+                    pitch, yaw = model(images_gaze)
+                    # Cross entropy loss
+                    loss_pitch_gaze = criterion(pitch, label_pitch_gaze)
+                    loss_yaw_gaze = criterion(yaw, label_yaw_gaze)
+                    # MSE loss
+                    pitch_predicted = softmax(pitch)
+                    yaw_predicted = softmax(yaw)
+                    pitch_predicted = \
+                        torch.sum(pitch_predicted * idx_tensor, 1) * 3 - 42
+                    yaw_predicted = \
+                        torch.sum(yaw_predicted * idx_tensor, 1) * 3 - 42
+                    loss_reg_pitch = reg_criterion(
+                        pitch_predicted, label_pitch_cont_gaze)
+                    loss_reg_yaw = reg_criterion(
+                        yaw_predicted, label_yaw_cont_gaze)
+                    # Total loss
+                    loss_pitch_gaze += alpha * loss_reg_pitch
+                    loss_yaw_gaze += alpha * loss_reg_yaw
+                    sum_loss_pitch_gaze += loss_pitch_gaze
+                    sum_loss_yaw_gaze += loss_yaw_gaze
+                    loss_seq = [loss_pitch_gaze, loss_yaw_gaze]
+                    grad_seq = \
+                        [torch.tensor(1.0).cuda(gpu) for _ in range(len(loss_seq))]
+                    optimizer_gaze.zero_grad(set_to_none=True)
+                    torch.autograd.backward(loss_seq, grad_seq)
+                    optimizer_gaze.step()
+                    iter_gaze += 1
+                    if (i+1) % 100 == 0:
+                        print('Epoch [%d/%d], Iter [%d/%d] Losses: '
+                            'Gaze Yaw %.4f,Gaze Pitch %.4f' % (
+                                epoch+1,
+                                num_epochs,
+                                i+1,
+                                len(dataset)//batch_size,
+                                sum_loss_pitch_gaze/iter_gaze,
+                                sum_loss_yaw_gaze/iter_gaze
+                            )
+                            )
+                # Save models at numbered epochs.
+                if epoch % 1 == 0 and epoch < num_epochs:
+                    print('Taking snapshot...',
+                        torch.save(model.state_dict(),
+                                    output+'/fold' + str(fold) +'/'+
+                                    '_epoch_' + str(epoch+1) + '.pkl')
+                        )

models/gaze_calibration.py ADDED Viewed

	@@ -0,0 +1,146 @@

+# 9-point gaze calibration for L2CS-Net
+# Maps raw gaze angles -> normalised screen coords via polynomial least-squares.
+# Centre point is the bias reference (subtracted from all readings).
+import numpy as np
+from dataclasses import dataclass, field
+# 3x3 grid, centre first (bias ref), then row by row
+DEFAULT_TARGETS = [
+    (0.5, 0.5),
+    (0.15, 0.15), (0.50, 0.15), (0.85, 0.15),
+    (0.15, 0.50),                (0.85, 0.50),
+    (0.15, 0.85), (0.50, 0.85), (0.85, 0.85),
+]
+@dataclass
+class _PointSamples:
+    target_x: float
+    target_y: float
+    yaws: list = field(default_factory=list)
+    pitches: list = field(default_factory=list)
+def _iqr_filter(values):
+    if len(values) < 4:
+        return values
+    arr = np.array(values)
+    q1, q3 = np.percentile(arr, [25, 75])
+    iqr = q3 - q1
+    lo, hi = q1 - 1.5 * iqr, q3 + 1.5 * iqr
+    return arr[(arr >= lo) & (arr <= hi)].tolist()
+class GazeCalibration:
+    def __init__(self, targets=None):
+        self._targets = targets or list(DEFAULT_TARGETS)
+        self._points = [_PointSamples(tx, ty) for tx, ty in self._targets]
+        self._current_idx = 0
+        self._fitted = False
+        self._W = None          # (6, 2) polynomial weights
+        self._yaw_bias = 0.0
+        self._pitch_bias = 0.0
+    @property
+    def num_points(self):
+        return len(self._targets)
+    @property
+    def current_index(self):
+        return self._current_idx
+    @property
+    def current_target(self):
+        if self._current_idx < len(self._targets):
+            return self._targets[self._current_idx]
+        return self._targets[-1]
+    @property
+    def is_complete(self):
+        return self._current_idx >= len(self._targets)
+    @property
+    def is_fitted(self):
+        return self._fitted
+    def collect_sample(self, yaw_rad, pitch_rad):
+        if self._current_idx >= len(self._points):
+            return
+        pt = self._points[self._current_idx]
+        pt.yaws.append(float(yaw_rad))
+        pt.pitches.append(float(pitch_rad))
+    def advance(self):
+        self._current_idx += 1
+        return self._current_idx < len(self._targets)
+    @staticmethod
+    def _poly_features(yaw, pitch):
+        # [yaw^2, pitch^2, yaw*pitch, yaw, pitch, 1]
+        return np.array([yaw**2, pitch**2, yaw * pitch, yaw, pitch, 1.0],
+                        dtype=np.float64)
+    def fit(self):
+        # bias from centre point (index 0)
+        center = self._points[0]
+        center_yaws = _iqr_filter(center.yaws)
+        center_pitches = _iqr_filter(center.pitches)
+        if len(center_yaws) < 2 or len(center_pitches) < 2:
+            return False
+        self._yaw_bias = float(np.median(center_yaws))
+        self._pitch_bias = float(np.median(center_pitches))
+        rows_A, rows_B = [], []
+        for pt in self._points:
+            clean_yaws = _iqr_filter(pt.yaws)
+            clean_pitches = _iqr_filter(pt.pitches)
+            if len(clean_yaws) < 2 or len(clean_pitches) < 2:
+                continue
+            med_yaw = float(np.median(clean_yaws)) - self._yaw_bias
+            med_pitch = float(np.median(clean_pitches)) - self._pitch_bias
+            rows_A.append(self._poly_features(med_yaw, med_pitch))
+            rows_B.append([pt.target_x, pt.target_y])
+        if len(rows_A) < 5:
+            return False
+        A = np.array(rows_A, dtype=np.float64)
+        B = np.array(rows_B, dtype=np.float64)
+        try:
+            W, _, _, _ = np.linalg.lstsq(A, B, rcond=None)
+            self._W = W
+            self._fitted = True
+            return True
+        except np.linalg.LinAlgError:
+            return False
+    def predict(self, yaw_rad, pitch_rad):
+        if not self._fitted or self._W is None:
+            return 0.5, 0.5
+        feat = self._poly_features(yaw_rad - self._yaw_bias, pitch_rad - self._pitch_bias)
+        xy = feat @ self._W
+        return float(np.clip(xy[0], 0, 1)), float(np.clip(xy[1], 0, 1))
+    def to_dict(self):
+        return {
+            "targets": self._targets,
+            "fitted": self._fitted,
+            "current_index": self._current_idx,
+            "W": self._W.tolist() if self._W is not None else None,
+            "yaw_bias": self._yaw_bias,
+            "pitch_bias": self._pitch_bias,
+        }
+    @classmethod
+    def from_dict(cls, d):
+        cal = cls(targets=d.get("targets", DEFAULT_TARGETS))
+        cal._fitted = d.get("fitted", False)
+        cal._current_idx = d.get("current_index", 0)
+        cal._yaw_bias = d.get("yaw_bias", 0.0)
+        cal._pitch_bias = d.get("pitch_bias", 0.0)
+        w = d.get("W")
+        if w is not None:
+            cal._W = np.array(w, dtype=np.float64)
+        return cal

models/gaze_eye_fusion.py ADDED Viewed

	@@ -0,0 +1,66 @@

+# Fuses calibrated gaze position with eye openness (EAR) for focus detection.
+# Takes L2CS gaze angles + MediaPipe landmarks, outputs screen coords + focus decision.
+import math
+import numpy as np
+from .gaze_calibration import GazeCalibration
+from .eye_scorer import compute_avg_ear
+_EAR_BLINK = 0.18
+_ON_SCREEN_MARGIN = 0.08
+class GazeEyeFusion:
+    def __init__(self, calibration, ear_weight=0.3, gaze_weight=0.7, focus_threshold=0.52):
+        if not calibration.is_fitted:
+            raise ValueError("Calibration must be fitted first")
+        self._cal = calibration
+        self._ear_w = ear_weight
+        self._gaze_w = gaze_weight
+        self._threshold = focus_threshold
+        self._smooth_x = 0.5
+        self._smooth_y = 0.5
+        self._alpha = 0.5
+    def update(self, yaw_rad, pitch_rad, landmarks):
+        gx, gy = self._cal.predict(yaw_rad, pitch_rad)
+        # EMA smooth the gaze position
+        self._smooth_x += self._alpha * (gx - self._smooth_x)
+        self._smooth_y += self._alpha * (gy - self._smooth_y)
+        gx, gy = self._smooth_x, self._smooth_y
+        on_screen = (
+            -_ON_SCREEN_MARGIN <= gx <= 1.0 + _ON_SCREEN_MARGIN and
+            -_ON_SCREEN_MARGIN <= gy <= 1.0 + _ON_SCREEN_MARGIN
+        )
+        ear = None
+        ear_score = 1.0
+        if landmarks is not None:
+            ear = compute_avg_ear(landmarks)
+            ear_score = 0.0 if ear < _EAR_BLINK else min(ear / 0.30, 1.0)
+        # penalise gaze near screen edges
+        gaze_score = 1.0 if on_screen else 0.0
+        if on_screen:
+            dx = max(0.0, abs(gx - 0.5) - 0.3)
+            dy = max(0.0, abs(gy - 0.5) - 0.3)
+            gaze_score = max(0.0, 1.0 - math.sqrt(dx**2 + dy**2) * 5.0)
+        score = float(np.clip(self._gaze_w * gaze_score + self._ear_w * ear_score, 0, 1))
+        return {
+            "gaze_x": round(float(gx), 4),
+            "gaze_y": round(float(gy), 4),
+            "on_screen": on_screen,
+            "ear": round(ear, 4) if ear is not None else None,
+            "focus_score": round(score, 4),
+            "focused": score >= self._threshold,
+        }
+    def reset(self):
+        self._smooth_x = 0.5
+        self._smooth_y = 0.5

requirements.txt CHANGED Viewed

@@ -14,3 +14,7 @@ aiosqlite>=0.19.0
 pydantic>=2.0.0
 xgboost>=2.0.0
 clearml>=2.0.2

 pydantic>=2.0.0
 xgboost>=2.0.0
 clearml>=2.0.2
+torch>=1.10.1
+torchvision>=0.11.2
+face_detection @ git+https://github.com/elliottzheng/face-detection
+gdown>=5.0.0

src/components/CalibrationOverlay.jsx ADDED Viewed

	@@ -0,0 +1,146 @@

+import React, { useState, useEffect, useRef, useCallback } from 'react';
+const COLLECT_MS = 2000;
+const CENTER_MS = 3000; // centre point gets extra time (bias reference)
+function CalibrationOverlay({ calibration, videoManager }) {
+  const [progress, setProgress] = useState(0);
+  const timerRef = useRef(null);
+  const startRef = useRef(null);
+  const overlayRef = useRef(null);
+  const enterFullscreen = useCallback(() => {
+    const el = overlayRef.current;
+    if (!el) return;
+    const req = el.requestFullscreen || el.webkitRequestFullscreen || el.msRequestFullscreen;
+    if (req) req.call(el).catch(() => {});
+  }, []);
+  const exitFullscreen = useCallback(() => {
+    if (document.fullscreenElement || document.webkitFullscreenElement) {
+      const exit = document.exitFullscreen || document.webkitExitFullscreen || document.msExitFullscreen;
+      if (exit) exit.call(document).catch(() => {});
+    }
+  }, []);
+  useEffect(() => {
+    if (calibration && calibration.active && !calibration.done) {
+      const t = setTimeout(enterFullscreen, 100);
+      return () => clearTimeout(t);
+    }
+  }, [calibration?.active]);
+  useEffect(() => {
+    if (!calibration || !calibration.active) exitFullscreen();
+  }, [calibration?.active]);
+  useEffect(() => {
+    if (!calibration || !calibration.collecting || calibration.done) {
+      setProgress(0);
+      if (timerRef.current) cancelAnimationFrame(timerRef.current);
+      return;
+    }
+    startRef.current = performance.now();
+    const duration = calibration.index === 0 ? CENTER_MS : COLLECT_MS;
+    const tick = () => {
+      const pct = Math.min((performance.now() - startRef.current) / duration, 1);
+      setProgress(pct);
+      if (pct >= 1) {
+        if (videoManager) videoManager.nextCalibrationPoint();
+        startRef.current = performance.now();
+        setProgress(0);
+      }
+      timerRef.current = requestAnimationFrame(tick);
+    };
+    timerRef.current = requestAnimationFrame(tick);
+    return () => { if (timerRef.current) cancelAnimationFrame(timerRef.current); };
+  }, [calibration?.index, calibration?.collecting, calibration?.done]);
+  const handleCancel = () => {
+    if (videoManager) videoManager.cancelCalibration();
+    exitFullscreen();
+  };
+  if (!calibration || !calibration.active) return null;
+  if (calibration.done) {
+    return (
+      <div ref={overlayRef} style={overlayStyle}>
+        <div style={messageBoxStyle}>
+          <h2 style={{ margin: '0 0 10px', color: calibration.success ? '#4ade80' : '#f87171' }}>
+            {calibration.success ? 'Calibration Complete' : 'Calibration Failed'}
+          </h2>
+          <p style={{ color: '#ccc', margin: 0 }}>
+            {calibration.success
+              ? 'Gaze tracking is now active.'
+              : 'Not enough samples collected. Try again.'}
+          </p>
+        </div>
+      </div>
+    );
+  }
+  const [tx, ty] = calibration.target || [0.5, 0.5];
+  return (
+    <div ref={overlayRef} style={overlayStyle}>
+      <div style={{
+        position: 'absolute', top: '30px', left: '50%', transform: 'translateX(-50%)',
+        color: '#fff', fontSize: '16px', textAlign: 'center',
+        textShadow: '0 0 8px rgba(0,0,0,0.8)', pointerEvents: 'none',
+      }}>
+        <div style={{ fontWeight: 'bold', fontSize: '20px' }}>
+          Look at the dot ({calibration.index + 1}/{calibration.numPoints})
+        </div>
+        <div style={{ fontSize: '14px', color: '#aaa', marginTop: '6px' }}>
+          {calibration.index === 0
+            ? 'Look at the center dot - this sets your baseline'
+            : 'Hold your gaze steady on the target'}
+        </div>
+      </div>
+      <div style={{
+        position: 'absolute', left: `${tx * 100}%`, top: `${ty * 100}%`,
+        transform: 'translate(-50%, -50%)',
+      }}>
+        <svg width="60" height="60" style={{ position: 'absolute', left: '-30px', top: '-30px' }}>
+          <circle cx="30" cy="30" r="24" fill="none" stroke="rgba(255,255,255,0.15)" strokeWidth="3" />
+          <circle cx="30" cy="30" r="24" fill="none" stroke="#4ade80" strokeWidth="3"
+            strokeDasharray={`${progress * 150.8} 150.8`} strokeLinecap="round"
+            transform="rotate(-90, 30, 30)" />
+        </svg>
+        <div style={{
+          width: '20px', height: '20px', borderRadius: '50%',
+          background: 'radial-gradient(circle, #fff 30%, #4ade80 100%)',
+          boxShadow: '0 0 20px rgba(74, 222, 128, 0.8)',
+        }} />
+      </div>
+      <button onClick={handleCancel} style={{
+        position: 'absolute', bottom: '40px', left: '50%', transform: 'translateX(-50%)',
+        padding: '10px 28px', background: 'rgba(255,255,255,0.1)',
+        border: '1px solid rgba(255,255,255,0.3)', color: '#fff',
+        borderRadius: '20px', cursor: 'pointer', fontSize: '14px',
+      }}>
+        Cancel Calibration
+      </button>
+    </div>
+  );
+}
+const overlayStyle = {
+  position: 'fixed', top: 0, left: 0, width: '100vw', height: '100vh',
+  background: 'rgba(0, 0, 0, 0.92)', zIndex: 10000,
+  display: 'flex', alignItems: 'center', justifyContent: 'center',
+};
+const messageBoxStyle = {
+  textAlign: 'center', padding: '30px 40px',
+  background: 'rgba(30, 30, 50, 0.9)', borderRadius: '16px',
+  border: '1px solid rgba(255,255,255,0.1)',
+};
+export default CalibrationOverlay;

src/components/FocusPageLocal.jsx CHANGED Viewed

@@ -1,4 +1,5 @@
 import React, { useState, useEffect, useRef } from 'react';
 function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActive }) {
   const [currentFrame, setCurrentFrame] = useState(15);
@@ -6,6 +7,9 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
   const [stats, setStats] = useState(null);
   const [availableModels, setAvailableModels] = useState([]);
   const [currentModel, setCurrentModel] = useState('mlp');
   const localVideoRef = useRef(null);
   const displayCanvasRef = useRef(null);
@@ -23,7 +27,6 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
   useEffect(() => {
     if (!videoManager) return;
-    // 设置回调函数来更新时间轴
     const originalOnStatusUpdate = videoManager.callbacks.onStatusUpdate;
     videoManager.callbacks.onStatusUpdate = (isFocused) => {
       setTimelineEvents(prev => {
@@ -34,7 +37,10 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       if (originalOnStatusUpdate) originalOnStatusUpdate(isFocused);
     };
-    // 定期更新统计信息
     const statsInterval = setInterval(() => {
       if (videoManager && videoManager.getStats) {
         setStats(videoManager.getStats());
@@ -44,6 +50,7 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
     return () => {
       if (videoManager) {
         videoManager.callbacks.onStatusUpdate = originalOnStatusUpdate;
       }
       clearInterval(statsInterval);
     };
@@ -56,6 +63,8 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       .then(data => {
         if (data.available) setAvailableModels(data.available);
         if (data.current) setCurrentModel(data.current);
       })
       .catch(err => console.error('Failed to fetch models:', err));
   }, []);
@@ -70,12 +79,28 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       const result = await res.json();
       if (result.updated) {
         setCurrentModel(modelName);
       }
     } catch (err) {
       console.error('Failed to switch model:', err);
     }
   };
   const handleStart = async () => {
     try {
       if (videoManager) {
@@ -443,6 +468,44 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
               {name}
             </button>
           ))}
         </section>
       )}
@@ -513,6 +576,9 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
           onChange={(e) => handleFrameChange(e.target.value)}
         />
       </section>
     </main>
   );
 }

 import React, { useState, useEffect, useRef } from 'react';
+import CalibrationOverlay from './CalibrationOverlay';
 function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActive }) {
   const [currentFrame, setCurrentFrame] = useState(15);
   const [stats, setStats] = useState(null);
   const [availableModels, setAvailableModels] = useState([]);
   const [currentModel, setCurrentModel] = useState('mlp');
+  const [calibration, setCalibration] = useState(null);
+  const [l2csBoost, setL2csBoost] = useState(false);
+  const [l2csBoostAvailable, setL2csBoostAvailable] = useState(false);
   const localVideoRef = useRef(null);
   const displayCanvasRef = useRef(null);
   useEffect(() => {
     if (!videoManager) return;
     const originalOnStatusUpdate = videoManager.callbacks.onStatusUpdate;
     videoManager.callbacks.onStatusUpdate = (isFocused) => {
       setTimelineEvents(prev => {
       if (originalOnStatusUpdate) originalOnStatusUpdate(isFocused);
     };
+    videoManager.callbacks.onCalibrationUpdate = (cal) => {
+      setCalibration(cal && cal.active ? { ...cal } : null);
+    };
     const statsInterval = setInterval(() => {
       if (videoManager && videoManager.getStats) {
         setStats(videoManager.getStats());
     return () => {
       if (videoManager) {
         videoManager.callbacks.onStatusUpdate = originalOnStatusUpdate;
+        videoManager.callbacks.onCalibrationUpdate = null;
       }
       clearInterval(statsInterval);
     };
       .then(data => {
         if (data.available) setAvailableModels(data.available);
         if (data.current) setCurrentModel(data.current);
+        if (data.l2cs_boost !== undefined) setL2csBoost(data.l2cs_boost);
+        if (data.l2cs_boost_available !== undefined) setL2csBoostAvailable(data.l2cs_boost_available);
       })
       .catch(err => console.error('Failed to fetch models:', err));
   }, []);
       const result = await res.json();
       if (result.updated) {
         setCurrentModel(modelName);
+        setL2csBoostAvailable(modelName !== 'l2cs' && availableModels.includes('l2cs'));
+        if (modelName === 'l2cs') setL2csBoost(false);
       }
     } catch (err) {
       console.error('Failed to switch model:', err);
     }
   };
+  const handleBoostToggle = async () => {
+    const next = !l2csBoost;
+    try {
+      const res = await fetch('/api/settings', {
+        method: 'PUT',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ l2cs_boost: next })
+      });
+      if (res.ok) setL2csBoost(next);
+    } catch (err) {
+      console.error('Failed to toggle L2CS boost:', err);
+    }
+  };
   const handleStart = async () => {
     try {
       if (videoManager) {
               {name}
             </button>
           ))}
+          {l2csBoostAvailable && currentModel !== 'l2cs' && (
+            <button
+              onClick={handleBoostToggle}
+              style={{
+                padding: '5px 14px',
+                borderRadius: '16px',
+                border: l2csBoost ? '2px solid #f59e0b' : '1px solid #555',
+                background: l2csBoost ? 'rgba(245, 158, 11, 0.15)' : 'transparent',
+                color: l2csBoost ? '#f59e0b' : '#888',
+                fontSize: '11px',
+                fontWeight: l2csBoost ? 'bold' : 'normal',
+                cursor: 'pointer',
+                transition: 'all 0.2s',
+                marginLeft: '4px',
+              }}
+            >
+              {l2csBoost ? 'GAZE ON' : 'GAZE'}
+            </button>
+          )}
+          {(currentModel === 'l2cs' || l2csBoost) && stats && stats.isStreaming && (
+            <button
+              onClick={() => videoManager && videoManager.startCalibration()}
+              style={{
+                padding: '5px 14px',
+                borderRadius: '16px',
+                border: '1px solid #4ade80',
+                background: 'transparent',
+                color: '#4ade80',
+                fontSize: '12px',
+                fontWeight: 'bold',
+                cursor: 'pointer',
+                transition: 'all 0.2s',
+                marginLeft: '4px',
+              }}
+            >
+              Calibrate
+            </button>
+          )}
         </section>
       )}
           onChange={(e) => handleFrameChange(e.target.value)}
         />
       </section>
+      {/* Calibration overlay (fixed fullscreen, must be outside overflow:hidden containers) */}
+      <CalibrationOverlay calibration={calibration} videoManager={videoManager} />
     </main>
   );
 }

src/utils/VideoManagerLocal.js CHANGED Viewed

@@ -39,6 +39,17 @@ export class VideoManagerLocal {
         this.lastNotificationTime = null;
         this.notificationCooldown = 60000;
         // 性能统计
         this.stats = {
             framesSent: 0,
@@ -73,8 +84,8 @@ export class VideoManagerLocal {
             // 创建用于截图的 canvas (smaller for faster encode + transfer)
             this.canvas = document.createElement('canvas');
-            this.canvas.width = 320;
-            this.canvas.height = 240;
             console.log('Local camera initialized');
             return true;
@@ -188,7 +199,7 @@ export class VideoManagerLocal {
                         this.ws.send(blob);
                         this.stats.framesSent++;
                     }
-                }, 'image/jpeg', 0.5);
             } catch (error) {
                 this._sendingBlob = false;
                 console.error('Capture error:', error);
@@ -253,6 +264,19 @@ export class VideoManagerLocal {
                         ctx.textAlign = 'left';
                     }
                 }
                 // Performance stats
                 ctx.fillStyle = 'rgba(0,0,0,0.5)';
                 ctx.fillRect(0, h - 25, w, 25);
@@ -321,6 +345,9 @@ export class VideoManagerLocal {
                     mar: data.mar,
                     sf: data.sf,
                     se: data.se,
                 };
                 this.drawDetectionResult(detectionData);
                 break;
@@ -338,6 +365,51 @@ export class VideoManagerLocal {
                 this.sessionStartTime = null;
                 break;
             case 'error':
                 console.error('Server error:', data.message);
                 break;
@@ -347,6 +419,28 @@ export class VideoManagerLocal {
         }
     }
     // Face mesh landmark index groups (matches live_demo.py)
     static FACE_OVAL = [10,338,297,332,284,251,389,356,454,323,361,288,397,365,379,378,400,377,152,148,176,149,150,136,172,58,132,93,234,127,162,21,54,103,67,109,10];
     static LEFT_EYE = [33,7,163,144,145,153,154,155,133,173,157,158,159,160,161,246];

         this.lastNotificationTime = null;
         this.notificationCooldown = 60000;
+        // Calibration state
+        this.calibration = {
+            active: false,
+            collecting: false,
+            target: null,
+            index: 0,
+            numPoints: 0,
+            done: false,
+            success: false,
+        };
         // 性能统计
         this.stats = {
             framesSent: 0,
             // 创建用于截图的 canvas (smaller for faster encode + transfer)
             this.canvas = document.createElement('canvas');
+            this.canvas.width = 640;
+            this.canvas.height = 480;
             console.log('Local camera initialized');
             return true;
                         this.ws.send(blob);
                         this.stats.framesSent++;
                     }
+                }, 'image/jpeg', 0.75);
             } catch (error) {
                 this._sendingBlob = false;
                 console.error('Capture error:', error);
                         ctx.textAlign = 'left';
                     }
                 }
+                // Gaze pointer (L2CS + calibration)
+                if (data && data.gaze_x !== undefined && data.gaze_y !== undefined) {
+                    const gx = data.gaze_x * w;
+                    const gy = data.gaze_y * h;
+                    ctx.beginPath();
+                    ctx.arc(gx, gy, 8, 0, 2 * Math.PI);
+                    ctx.fillStyle = data.on_screen ? 'rgba(0, 200, 255, 0.7)' : 'rgba(255, 80, 80, 0.5)';
+                    ctx.fill();
+                    ctx.strokeStyle = '#FFFFFF';
+                    ctx.lineWidth = 2;
+                    ctx.stroke();
+                }
                 // Performance stats
                 ctx.fillStyle = 'rgba(0,0,0,0.5)';
                 ctx.fillRect(0, h - 25, w, 25);
                     mar: data.mar,
                     sf: data.sf,
                     se: data.se,
+                    gaze_x: data.gaze_x,
+                    gaze_y: data.gaze_y,
+                    on_screen: data.on_screen,
                 };
                 this.drawDetectionResult(detectionData);
                 break;
                 this.sessionStartTime = null;
                 break;
+            case 'calibration_started':
+                this.calibration = {
+                    active: true,
+                    collecting: true,
+                    target: data.target,
+                    index: data.index,
+                    numPoints: data.num_points,
+                    done: false,
+                    success: false,
+                };
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                break;
+            case 'calibration_point':
+                this.calibration.target = data.target;
+                this.calibration.index = data.index;
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                break;
+            case 'calibration_done':
+                this.calibration.collecting = false;
+                this.calibration.done = true;
+                this.calibration.success = data.success;
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                setTimeout(() => {
+                    this.calibration.active = false;
+                    if (this.callbacks.onCalibrationUpdate) {
+                        this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                    }
+                }, 2000);
+                break;
+            case 'calibration_cancelled':
+                this.calibration = { active: false, collecting: false, target: null, index: 0, numPoints: 0, done: false, success: false };
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                break;
             case 'error':
                 console.error('Server error:', data.message);
                 break;
         }
     }
+    startCalibration() {
+        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+            this.ws.send(JSON.stringify({ type: 'calibration_start' }));
+        }
+    }
+    nextCalibrationPoint() {
+        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+            this.ws.send(JSON.stringify({ type: 'calibration_next' }));
+        }
+    }
+    cancelCalibration() {
+        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+            this.ws.send(JSON.stringify({ type: 'calibration_cancel' }));
+        }
+        this.calibration = { active: false, collecting: false, target: null, index: 0, numPoints: 0, done: false, success: false };
+        if (this.callbacks.onCalibrationUpdate) {
+            this.callbacks.onCalibrationUpdate({ ...this.calibration });
+        }
+    }
     // Face mesh landmark index groups (matches live_demo.py)
     static FACE_OVAL = [10,338,297,332,284,251,389,356,454,323,361,288,397,365,379,378,400,377,152,148,176,149,150,136,172,58,132,93,234,127,162,21,54,103,67,109,10];
     static LEFT_EYE = [33,7,163,144,145,153,154,155,133,173,157,158,159,160,161,246];

ui/pipeline.py CHANGED Viewed

@@ -3,6 +3,7 @@ import glob
 import json
 import math
 import os
 import sys
 import numpy as np
@@ -49,10 +50,12 @@ def _clip_features(vec):
 class _OutputSmoother:
-    """EMA smoothing on focus score with no-face grace period."""
-    def __init__(self, alpha: float = 0.3, grace_frames: int = 15):
-        self._alpha = alpha
         self._grace = grace_frames
         self._score = 0.5
         self._no_face = 0
@@ -61,14 +64,15 @@ class _OutputSmoother:
         self._score = 0.5
         self._no_face = 0
-    def update(self, raw_score: float, face_detected: bool) -> float:
         if face_detected:
             self._no_face = 0
-            self._score += self._alpha * (raw_score - self._score)
         else:
             self._no_face += 1
             if self._no_face > self._grace:
-                self._score *= 0.85
         return self._score
@@ -640,3 +644,141 @@ class XGBoostPipeline:
     def __exit__(self, *args):
         self.close()

 import json
 import math
 import os
+import pathlib
 import sys
 import numpy as np
 class _OutputSmoother:
+    # Asymmetric EMA: rises fast (recognise focus), falls slower (avoid flicker).
+    # Grace period holds score steady for a few frames when face is lost.
+    def __init__(self, alpha_up=0.55, alpha_down=0.45, grace_frames=10):
+        self._alpha_up = alpha_up
+        self._alpha_down = alpha_down
         self._grace = grace_frames
         self._score = 0.5
         self._no_face = 0
         self._score = 0.5
         self._no_face = 0
+    def update(self, raw_score, face_detected):
         if face_detected:
             self._no_face = 0
+            alpha = self._alpha_up if raw_score > self._score else self._alpha_down
+            self._score += alpha * (raw_score - self._score)
         else:
             self._no_face += 1
             if self._no_face > self._grace:
+                self._score *= 0.80
         return self._score
     def __exit__(self, *args):
         self.close()
+def _resolve_l2cs_weights():
+    for p in [
+        os.path.join(_PROJECT_ROOT, "models", "L2CS-Net", "models", "L2CSNet_gaze360.pkl"),
+        os.path.join(_PROJECT_ROOT, "models", "L2CSNet_gaze360.pkl"),
+        os.path.join(_PROJECT_ROOT, "checkpoints", "L2CSNet_gaze360.pkl"),
+    ]:
+        if os.path.isfile(p):
+            return p
+    return None
+def is_l2cs_weights_available():
+    return _resolve_l2cs_weights() is not None
+class L2CSPipeline:
+    # Uses in-tree l2cs.Pipeline (RetinaFace + ResNet50) for gaze estimation
+    # and MediaPipe for head pose, EAR, MAR, and roll de-rotation.
+    YAW_THRESHOLD = 22.0
+    PITCH_THRESHOLD = 20.0
+    def __init__(self, weights_path=None, arch="ResNet50", device="cpu",
+                 threshold=0.52, detector=None):
+        resolved = weights_path or _resolve_l2cs_weights()
+        if resolved is None or not os.path.isfile(resolved):
+            raise FileNotFoundError(
+                "L2CS weights not found. Place L2CSNet_gaze360.pkl in "
+                "models/L2CS-Net/models/ or checkpoints/"
+            )
+        # add in-tree L2CS-Net to import path
+        l2cs_root = os.path.join(_PROJECT_ROOT, "models", "L2CS-Net")
+        if l2cs_root not in sys.path:
+            sys.path.insert(0, l2cs_root)
+        from l2cs import Pipeline as _L2CSPipeline
+        import torch
+        # bypass upstream select_device bug by constructing torch.device directly
+        self._pipeline = _L2CSPipeline(
+            weights=pathlib.Path(resolved), arch=arch, device=torch.device(device),
+        )
+        self._detector = detector or FaceMeshDetector()
+        self._owns_detector = detector is None
+        self._head_pose = HeadPoseEstimator()
+        self.head_pose = self._head_pose
+        self._eye_scorer = EyeBehaviourScorer()
+        self._threshold = threshold
+        self._smoother = _OutputSmoother()
+        print(
+            f"[L2CS] Loaded {resolved} | arch={arch} device={device} "
+            f"yaw_thresh={self.YAW_THRESHOLD} pitch_thresh={self.PITCH_THRESHOLD} "
+            f"threshold={threshold}"
+        )
+    @staticmethod
+    def _derotate_gaze(pitch_rad, yaw_rad, roll_deg):
+        # remove head roll so tilted-but-looking-at-screen reads as (0,0)
+        roll_rad = -math.radians(roll_deg)
+        cos_r, sin_r = math.cos(roll_rad), math.sin(roll_rad)
+        return (yaw_rad * sin_r + pitch_rad * cos_r,
+                yaw_rad * cos_r - pitch_rad * sin_r)
+    def process_frame(self, bgr_frame):
+        landmarks = self._detector.process(bgr_frame)
+        h, w = bgr_frame.shape[:2]
+        out = {
+            "landmarks": landmarks, "is_focused": False, "raw_score": 0.0,
+            "s_face": 0.0, "s_eye": 0.0, "gaze_pitch": None, "gaze_yaw": None,
+            "yaw": None, "pitch": None, "roll": None, "mar": None, "is_yawning": False,
+        }
+        # MediaPipe: head pose, eye/mouth scores
+        roll_deg = 0.0
+        if landmarks is not None:
+            angles = self._head_pose.estimate(landmarks, w, h)
+            if angles is not None:
+                out["yaw"], out["pitch"], out["roll"] = angles
+                roll_deg = angles[2]
+            out["s_face"] = self._head_pose.score(landmarks, w, h)
+            out["s_eye"] = self._eye_scorer.score(landmarks)
+            out["mar"] = compute_mar(landmarks)
+            out["is_yawning"] = out["mar"] > MAR_YAWN_THRESHOLD
+        # L2CS gaze (uses its own RetinaFace detector internally)
+        results = self._pipeline.step(bgr_frame)
+        if results is None or results.pitch.shape[0] == 0:
+            smoothed = self._smoother.update(0.0, landmarks is not None)
+            out["raw_score"] = smoothed
+            out["is_focused"] = smoothed >= self._threshold
+            return out
+        pitch_rad = float(results.pitch[0])
+        yaw_rad = float(results.yaw[0])
+        pitch_rad, yaw_rad = self._derotate_gaze(pitch_rad, yaw_rad, roll_deg)
+        out["gaze_pitch"] = pitch_rad
+        out["gaze_yaw"] = yaw_rad
+        yaw_deg = abs(math.degrees(yaw_rad))
+        pitch_deg = abs(math.degrees(pitch_rad))
+        # fall back to L2CS angles if MediaPipe didn't produce head pose
+        out["yaw"] = out.get("yaw") or math.degrees(yaw_rad)
+        out["pitch"] = out.get("pitch") or math.degrees(pitch_rad)
+        # cosine scoring: 1.0 at centre, 0.0 at threshold
+        yaw_t = min(yaw_deg / self.YAW_THRESHOLD, 1.0)
+        pitch_t = min(pitch_deg / self.PITCH_THRESHOLD, 1.0)
+        yaw_score = 0.5 * (1.0 + math.cos(math.pi * yaw_t))
+        pitch_score = 0.5 * (1.0 + math.cos(math.pi * pitch_t))
+        gaze_score = 0.55 * yaw_score + 0.45 * pitch_score
+        if out["is_yawning"]:
+            gaze_score = 0.0
+        out["raw_score"] = self._smoother.update(float(gaze_score), True)
+        out["is_focused"] = out["raw_score"] >= self._threshold
+        return out
+    def reset_session(self):
+        self._smoother.reset()
+    def close(self):
+        if self._owns_detector:
+            self._detector.close()
+    def __enter__(self):
+        return self
+    def __exit__(self, *args):
+        self.close()