ASL Transformer 84-Class 408D Model
This repository contains a trained PyTorch Transformer model for isolated American Sign Language classification using MediaPipe Holistic keypoint features.
Model Performance
- Validation Top-1 Accuracy: 72.87%
- Validation Top-5 Accuracy: 90.15%
- Number of classes: 84
- Input sequence length: 50
- Input feature dimension: 408
Important Files
best_transformer_asl_84class.ptβ best trained checkpointlast_transformer_asl_84class.ptβ final checkpoint from trainingconfig.jsonβ model/training configurationid_to_label.jsonβ class ID to label mappingtraining_history.csvβ epoch-by-epoch training historyclassification_report.csvβ validation classification reportconfusion_matrix.npyβ validation confusion matrixtraining_summary.jsonβ final training summary
Input Format
The model expects input shaped:
(batch_size, 50, 408)
Where:
50= sequence length408= 204 normalized keypoint features + 204 velocity features
Training Summary
{
"best_epoch": 60,
"best_val_top1": 0.7287202392305646,
"final_val_loss": 1.4968034369604928,
"final_val_top1": 0.7287202392305646,
"final_val_top5": 0.9014880997794015,
"total_training_time_sec": 265.57665967941284,
"total_training_time": "04m 25s",
"best_checkpoint": "/kaggle/working/asl_transformer_84class_run/best_transformer_asl_84class.pt",
"last_checkpoint": "/kaggle/working/asl_transformer_84class_run/last_transformer_asl_84class.pt"
}
Config
{
"data_dir": "/kaggle/working/training_model23_final_train_ready",
"output_dir": "/kaggle/working/asl_transformer_84class_run",
"seq_len": 50,
"input_dim": 408,
"num_classes": 84,
"d_model": 256,
"nhead": 4,
"num_layers": 3,
"dim_feedforward": 512,
"dropout": 0.3,
"batch_size": 64,
"epochs": 60,
"max_lr": 0.0006,
"weight_decay": 0.0001,
"label_smoothing": 0.05,
"grad_clip": 1.0,
"patience": 10,
"num_workers": 2,
"pin_memory": true,
"use_augmentation": true,
"noise_std": 0.01,
"feature_dropout_prob": 0.03,
"time_mask_prob": 0.15,
"time_mask_max_len": 6,
"seed": 42,
"use_amp": true
}
Notes
During inference, use the same preprocessing pipeline used during training:
- Extract MediaPipe Holistic keypoints.
- Normalize keypoints the same way as training.
- Build a 50-frame sequence.
- Add velocity features to convert 204D input into 408D input.
- Feed tensor shaped
(1, 50, 408)into the model. - Convert predicted class ID using
id_to_label.json.
- Downloads last month
- 17
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support