metadata

title: BISINDO Sign Language Recognition API
sdk: docker
emoji: 💻
colorFrom: blue
colorTo: indigo
pinned: true

BISINDO Sign Language Recognition API

A FastAPI-based REST API that performs video classification using deep learning models. The system processes video inputs through MediaPipe for pose estimation and uses a custom transformer model for action classification.

Project Structure

.
├── app/
│   ├── preprocessing.py    # Video preprocessing and pose estimation
│   ├── model.py           # ML model definition and inference
│   └── main.py           # FastAPI application and endpoints
├── model/                 # Directory containing model files
│   ├── sign_transformer.keras  # Trained model weights
│   └── labels.json       # Class labels mapping
├── requirements.txt       # Project dependencies
└── README.md             # Project documentation

Local Development

Manual Setup

Clone the repository:

git clone https://github.com/yourusername/video-classification-api
cd video-classification-api

Create and activate virtual environment:

python -m venv venv
.\venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Start the development server:

cd app
fastapi dev main.py

Run using Docker

Build the Docker image:

docker build -t sign-recog-api .

Run the Docker container:

docker run -d -p 7860:7860 sign-recog-api

API Endpoints

GET /

Health check endpoint that returns API status

POST /predict

Accepts video URL and returns classification results

Request Body:

{
    "url": "https://drive.google.com/uc?id=1ZtIq7sxkrHuRB7HOdD3d7MDMOdwRFdPm&export=download"
}

Response:

{
    "label": "buka",
    "confidence": 0.95
}

Inference Pipeline

Video Input Processing
- Downloads video from provided URL
- Performs motion-based trimming
- Samples 113 frames from the video
Pose Estimation
- Uses MediaPipe Holistic model to extract:
  - 33 pose landmarks
  - 21 left hand landmarks
  - 21 right hand landmarks
- Calculates angles between key points
Model Prediction
- Processes landmarks and angles through transformer model
- Returns predicted action label with confidence score