Spaces:
Sleeping
Sleeping
metadata
title: BISINDO Sign Language Recognition API
sdk: docker
emoji: π»
colorFrom: blue
colorTo: indigo
pinned: true
BISINDO Sign Language Recognition API
A FastAPI-based REST API that performs video classification using deep learning models. The system processes video inputs through MediaPipe for pose estimation and uses a custom transformer model for action classification.
Project Structure
.
βββ app/
β βββ preprocessing.py # Video preprocessing and pose estimation
β βββ model.py # ML model definition and inference
β βββ main.py # FastAPI application and endpoints
βββ model/ # Directory containing model files
β βββ sign_transformer.keras # Trained model weights
β βββ labels.json # Class labels mapping
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
Local Development
Manual Setup
- Clone the repository:
git clone https://github.com/yourusername/video-classification-api
cd video-classification-api
- Create and activate virtual environment:
python -m venv venv
.\venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Start the development server:
cd app
fastapi dev main.py
Run using Docker
- Build the Docker image:
docker build -t sign-recog-api .
- Run the Docker container:
docker run -d -p 7860:7860 sign-recog-api
API Endpoints
GET /
Health check endpoint that returns API status
POST /predict
Accepts video URL and returns classification results
Request Body:
{
"url": "https://drive.google.com/uc?id=1ZtIq7sxkrHuRB7HOdD3d7MDMOdwRFdPm&export=download"
}
Response:
{
"label": "buka",
"confidence": 0.95
}
Inference Pipeline
Video Input Processing
- Downloads video from provided URL
- Performs motion-based trimming
- Samples 113 frames from the video
Pose Estimation
- Uses MediaPipe Holistic model to extract:
- 33 pose landmarks
- 21 left hand landmarks
- 21 right hand landmarks
- Calculates angles between key points
- Uses MediaPipe Holistic model to extract:
Model Prediction
- Processes landmarks and angles through transformer model
- Returns predicted action label with confidence score