🤟 ASL Sign Language Recognition — EfficientNet-B0

A fine-tuned EfficientNet-B0 model for recognizing American Sign Language (ASL) alphabet letters from images. Achieves 99.02% accuracy on the evaluation set.

Model Details

Property	Value
Base Model	google/efficientnet-b0 (ImageNet pretrained)
Parameters	4,040,854 (~15.6MB)
Input Size	224×224 RGB images
Classes	26 (A-Z ASL alphabet letters)
Inference Speed	<10ms/frame on GPU, ~30ms on CPU

Training

Hyperparameter	Value
Learning Rate	2e-4
Batch Size	16
Epochs	5
Optimizer	AdamW
LR Scheduler	Cosine
Weight Decay	1e-4
Warmup Ratio	5%

Training Results

Epoch	Eval Accuracy	Eval Loss
1	89.45%	0.405
2	97.67%	0.096
3	98.28%	0.056
4	98.71%	0.047
5	99.02%	0.036

Data Augmentation

RandomResizedCrop (scale 0.8-1.0)
RandomHorizontalFlip (p=0.3)
RandomRotation (±15°)
ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.1)

Usage

from transformers import pipeline

classifier = pipeline("image-classification", model="abdollahhh/asl-sign-language-efficientnet-b0")
result = classifier("path/to/hand_sign.jpg")
print(result)
# [{'label': 'A', 'score': 0.98}, ...]

Manual inference

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

processor = AutoImageProcessor.from_pretrained("abdollahhh/asl-sign-language-efficientnet-b0")
model = AutoModelForImageClassification.from_pretrained("abdollahhh/asl-sign-language-efficientnet-b0")
model.eval()

image = Image.open("hand_sign.jpg")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax(-1).item()
label = model.config.id2label[str(predicted_class)]
print(f"Predicted: {label}")

Live Demo

Try the real-time webcam demo: ASL Sign Language Recognition Space

Dataset

Trained on Marxulia/asl_sign_languages_alphabets_v03:

10,873 images total (9,242 train / 1,631 eval)
26 classes: A through Z
Stratified 85/15 train/eval split

Limitations

Trained on controlled studio images — may have reduced accuracy with varied backgrounds/lighting
Only recognizes static letter signs (A-Z), not dynamic gestures (J, Z involve motion)
Works best with a clean hand against a neutral background

Downloads last month: 47

Safetensors

Model size

4.08M params

Tensor type

F32

Model tree for abdollahhh/asl-sign-language-efficientnet-b0

Base model

google/efficientnet-b0

Finetuned

(56)

this model

Dataset used to train abdollahhh/asl-sign-language-efficientnet-b0

Space using abdollahhh/asl-sign-language-efficientnet-b0 1

Evaluation results

Accuracy on ASL Sign Languages Alphabets v03
self-reported

0.990