afs-backend / documents /REFACTORING_SUMMARY.md
arnavam's picture
made seld into fastapi
514a298

Refactoring Summary

What Was Done

1. Model Directory Usage Analysis

The backend uses the following files from /Model/ directory:

  • embeddings_cache.pkl - Face recognition embeddings cache
  • yolov8n-face.pt - YOLO face detection model
  • my_scan.mp4 - Reference 360-degree scan video
  • Adi.jpg - Reference images

Both single_tracker.py and multi_tracker.py access the Model directory.

2. Created New Services

services/face_recognition.py

  • Extracted face recognition logic from Model/face_model.py
  • Class: FaceRecognitionService
  • Methods:
    • extract_embeddings_from_video() - Process 360Β° video with quality filtering
    • extract_embeddings_from_image() - Process single reference image
    • save_embeddings_cache() - Save processed embeddings
    • load_embeddings_cache() - Load cached embeddings
    • calculate_blur_score() - Image sharpness detection
    • calculate_frontal_score() - Face frontality score

services/audio_processing.py

  • New service for audio streaming with angle data
  • Class: AudioProcessor
  • Methods:
    • create_audio_stream() - Start new recording session
    • write_audio_chunk() - Write audio with optional angle metadata
    • close_audio_stream() - Finalize recording
    • get_audio_files() - List all recordings

3. Added API Endpoints to server.py

Face Recognition APIs:

  • POST /api/face/upload-video - Upload 360Β° reference video
  • POST /api/face/upload-image - Upload reference image
  • GET /api/face/cache-status - Check embeddings cache status

Audio Streaming APIs:

  • POST /api/audio/start-stream - Start audio recording session
  • WebSocket /ws/audio/{session_id} - Stream audio with angle data
  • POST /api/audio/stop-stream/{session_id} - Stop recording
  • GET /api/audio/recordings - List all recordings

4. File Storage Structure

/Model/
β”œβ”€β”€ my_scan.mp4                    # Reference video (uploaded via API)
β”œβ”€β”€ ref_*.jpg                      # Reference images (uploaded via API)
β”œβ”€β”€ embeddings_cache.pkl           # Processed face embeddings
β”œβ”€β”€ yolov8n-face.pt               # YOLO model (static)
└── audio_recordings/
    β”œβ”€β”€ audio_{uuid}_{timestamp}.wav           # Audio recording
    └── audio_{uuid}_{timestamp}_metadata.txt  # Angle metadata (CSV)

5. Audio Metadata Format

The metadata file stores timestamp and angle in CSV format:

timestamp,angle
0.000,45.50
0.064,46.20
0.128,47.00

How to Use

Upload 360-Degree Video:

curl -X POST "http://localhost:8000/api/face/upload-video" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@my_360_scan.mp4"

Upload Reference Image:

curl -X POST "http://localhost:8000/api/face/upload-image" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@reference.jpg"

Start Audio Stream:

# 1. Start stream (get session_id)
curl -X POST "http://localhost:8000/api/audio/start-stream" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "sample_rate=16000" \
  -F "channels=1"

# 2. Connect via WebSocket and stream
# ws://localhost:8000/ws/audio/{session_id}

# 3. Send audio chunks (binary or JSON with angle)
# Binary: raw 16-bit PCM audio bytes
# JSON: {"audio_data": "base64...", "angle": 45.5}

# 4. Stop: {"command": "stop"}

Key Features

  1. Quality Filtering: Video processing uses blur detection and frontal face scoring to select best frames
  2. Temporal Spacing: Selects frames evenly distributed across the video for comprehensive coverage
  3. Angle Tracking: Audio streams can include direction/angle metadata for spatial audio analysis
  4. Mono/Stereo Support: Configurable audio channels (1 or 2)
  5. Authentication: All endpoints protected with JWT tokens
  6. Async Processing: CPU-intensive tasks run in thread pool executor

Original face_model.py

The original file at /Model/face_model.py remains unchanged and can still be run standalone for testing or manual processing. The new API provides the same functionality but in a service-oriented architecture accessible via HTTP/WebSocket.

Dependencies

All required packages are already in requirements.txt:

  • FastAPI, Uvicorn
  • OpenCV (cv2)
  • DeepFace
  • Ultralytics (YOLO)
  • NumPy
  • Wave (stdlib)

No additional dependencies needed!