ASL Transformer 84-Class 408D Model

This repository contains a trained PyTorch Transformer model for isolated American Sign Language classification using MediaPipe Holistic keypoint features.

Model Performance

  • Validation Top-1 Accuracy: 72.87%
  • Validation Top-5 Accuracy: 90.15%
  • Number of classes: 84
  • Input sequence length: 50
  • Input feature dimension: 408

Important Files

  • best_transformer_asl_84class.pt β€” best trained checkpoint
  • last_transformer_asl_84class.pt β€” final checkpoint from training
  • config.json β€” model/training configuration
  • id_to_label.json β€” class ID to label mapping
  • training_history.csv β€” epoch-by-epoch training history
  • classification_report.csv β€” validation classification report
  • confusion_matrix.npy β€” validation confusion matrix
  • training_summary.json β€” final training summary

Input Format

The model expects input shaped:

(batch_size, 50, 408)

Where:

  • 50 = sequence length
  • 408 = 204 normalized keypoint features + 204 velocity features

Training Summary

{
  "best_epoch": 60,
  "best_val_top1": 0.7287202392305646,
  "final_val_loss": 1.4968034369604928,
  "final_val_top1": 0.7287202392305646,
  "final_val_top5": 0.9014880997794015,
  "total_training_time_sec": 265.57665967941284,
  "total_training_time": "04m 25s",
  "best_checkpoint": "/kaggle/working/asl_transformer_84class_run/best_transformer_asl_84class.pt",
  "last_checkpoint": "/kaggle/working/asl_transformer_84class_run/last_transformer_asl_84class.pt"
}

Config

{
  "data_dir": "/kaggle/working/training_model23_final_train_ready",
  "output_dir": "/kaggle/working/asl_transformer_84class_run",
  "seq_len": 50,
  "input_dim": 408,
  "num_classes": 84,
  "d_model": 256,
  "nhead": 4,
  "num_layers": 3,
  "dim_feedforward": 512,
  "dropout": 0.3,
  "batch_size": 64,
  "epochs": 60,
  "max_lr": 0.0006,
  "weight_decay": 0.0001,
  "label_smoothing": 0.05,
  "grad_clip": 1.0,
  "patience": 10,
  "num_workers": 2,
  "pin_memory": true,
  "use_augmentation": true,
  "noise_std": 0.01,
  "feature_dropout_prob": 0.03,
  "time_mask_prob": 0.15,
  "time_mask_max_len": 6,
  "seed": 42,
  "use_amp": true
}

Notes

During inference, use the same preprocessing pipeline used during training:

  1. Extract MediaPipe Holistic keypoints.
  2. Normalize keypoints the same way as training.
  3. Build a 50-frame sequence.
  4. Add velocity features to convert 204D input into 408D input.
  5. Feed tensor shaped (1, 50, 408) into the model.
  6. Convert predicted class ID using id_to_label.json.
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support