Louisljz's picture
fix port & add norm
f02ac30
metadata
title: BISINDO Sign Language Recognition API
sdk: docker
emoji: πŸ’»
colorFrom: blue
colorTo: indigo
pinned: true

BISINDO Sign Language Recognition API

A FastAPI-based REST API that performs video classification using deep learning models. The system processes video inputs through MediaPipe for pose estimation and uses a custom transformer model for action classification.

Project Structure

.
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ preprocessing.py    # Video preprocessing and pose estimation
β”‚   β”œβ”€β”€ model.py           # ML model definition and inference
β”‚   └── main.py           # FastAPI application and endpoints
β”œβ”€β”€ model/                 # Directory containing model files
β”‚   β”œβ”€β”€ sign_transformer.keras  # Trained model weights
β”‚   └── labels.json       # Class labels mapping
β”œβ”€β”€ requirements.txt       # Project dependencies
└── README.md             # Project documentation

Local Development

Manual Setup

  1. Clone the repository:
git clone https://github.com/yourusername/video-classification-api
cd video-classification-api
  1. Create and activate virtual environment:
python -m venv venv
.\venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Start the development server:
cd app
fastapi dev main.py

Run using Docker

  1. Build the Docker image:
docker build -t sign-recog-api .
  1. Run the Docker container:
docker run -d -p 7860:7860 sign-recog-api

API Endpoints

GET /

Health check endpoint that returns API status

POST /predict

Accepts video URL and returns classification results

Request Body:

{
    "url": "https://drive.google.com/uc?id=1ZtIq7sxkrHuRB7HOdD3d7MDMOdwRFdPm&export=download"
}

Response:

{
    "label": "buka",
    "confidence": 0.95
}

Inference Pipeline

  1. Video Input Processing

    • Downloads video from provided URL
    • Performs motion-based trimming
    • Samples 113 frames from the video
  2. Pose Estimation

    • Uses MediaPipe Holistic model to extract:
      • 33 pose landmarks
      • 21 left hand landmarks
      • 21 right hand landmarks
    • Calculates angles between key points
  3. Model Prediction

    • Processes landmarks and angles through transformer model
    • Returns predicted action label with confidence score