AfriSignEncoder Exp1 — KSL Baseline (LandmarkTransformer)

Part of the AfriSignEncoder research project: a multilingual African sign language recognition benchmark. This checkpoint is the Experiment 1 single-language baseline for Kenyan Sign Language (KSL).

Scope note: This baseline covers only 4 KSL glosses (father, hello, is, my) from a small demo-scale dataset. The 100% accuracy reflects the simplicity of the 4-class task and should not be generalised to full KSL vocabulary recognition.

Model Description

Same LandmarkTransformer architecture as the CASL baseline, with a 4-class head.

Component Value
Architecture LandmarkTransformer (custom)
Input (B, 64, 225) float32 — 75 keypoints × 3 coords per frame
Embedding dim 256
Attention heads 8
Encoder layers 4
Feed-forward dim 1,024
Positional encoding Learned
Classification head Linear 256 → 4
Parameters ~3.24 M

Dataset

KSL Hand Landmarks — 4 Kenyan Sign Language words.

Split Samples Classes
Train 694 4
Test 124 4

Source: Kaggle joanwachuka/ksl-hand-landmarks → parquet at luciayen/KSL-Hand-Landmarks.

Landmark caveat: The original .npy files contain MediaPipe Hands-only keypoints (42 joints × 3 = 126D). These are placed in dimensions [0:126]; dimensions [126:225] (the 99 pose body dims) are zero-padded. The model therefore learns from hand shape and motion only; the padded dimensions are always zero and contribute no signal.

Data leakage fix: The Kaggle archive contains two sub-directories — data_split/ (official train/val/test) and dataset3/ (pre-split source). All 124 test samples appear verbatim in dataset3/. The upload script excludes dataset3/ entirely, giving a clean train=694 / test=124 split with zero overlap.

Training

Identical protocol to the CASL baseline.

Setting Value
Optimiser AdamW (lr=3e-4, wd=1e-4)
LR schedule OneCycleLR cosine
Max epochs 60
Batch size 64
Loss CrossEntropy + label_smoothing=0.1
Early stopping patience=12 on val acc
Normalisation Per-feature z-score (stats stored in checkpoint)

Results

Metric Value
Best validation accuracy 100%
Best checkpoint epoch 3
Final epoch (early stop) 15
Number of classes 4

The 100% result is expected: 4 highly phonologically distinct signs, ~173 training samples per class, and a well-regularised model. This result validates the pipeline; it does not benchmark KSL at meaningful scale.

Checkpoint Contents

import torch
ck = torch.load("pytorch_model.bin", map_location="cpu")
# Keys: epoch, val_acc, model (state_dict), l2i (label→index dict),
#        mean (tensor 225,), std (tensor 225,)
# l2i = {"father": 0, "hello": 1, "is": 2, "my": 3}

Limitations

  • Only 4 classes — not a meaningful KSL benchmark.
  • Pose dims are always zero (hand-only source data). Re-extraction with MediaPipe Holistic is planned.
  • Small dataset; results will change with more data.

Citation / Project

AfriSignEncoder research project, CMU, 2026. GitHub: africansl_encoder.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including luciayen/afrisign-exp1-ksl-baseline

Evaluation results