AfriSignEncoder Exp1 — KSL Baseline (LandmarkTransformer)
Part of the AfriSignEncoder research project: a multilingual African sign language recognition benchmark. This checkpoint is the Experiment 1 single-language baseline for Kenyan Sign Language (KSL).
Scope note: This baseline covers only 4 KSL glosses (father, hello, is, my) from a small demo-scale dataset. The 100% accuracy reflects the simplicity of the 4-class task and should not be generalised to full KSL vocabulary recognition.
Model Description
Same LandmarkTransformer architecture as the CASL baseline, with a 4-class head.
| Component | Value |
|---|---|
| Architecture | LandmarkTransformer (custom) |
| Input | (B, 64, 225) float32 — 75 keypoints × 3 coords per frame |
| Embedding dim | 256 |
| Attention heads | 8 |
| Encoder layers | 4 |
| Feed-forward dim | 1,024 |
| Positional encoding | Learned |
| Classification head | Linear 256 → 4 |
| Parameters | ~3.24 M |
Dataset
KSL Hand Landmarks — 4 Kenyan Sign Language words.
| Split | Samples | Classes |
|---|---|---|
| Train | 694 | 4 |
| Test | 124 | 4 |
Source: Kaggle joanwachuka/ksl-hand-landmarks → parquet at luciayen/KSL-Hand-Landmarks.
Landmark caveat: The original .npy files contain MediaPipe Hands-only keypoints
(42 joints × 3 = 126D). These are placed in dimensions [0:126]; dimensions [126:225]
(the 99 pose body dims) are zero-padded. The model therefore learns from hand shape
and motion only; the padded dimensions are always zero and contribute no signal.
Data leakage fix: The Kaggle archive contains two sub-directories — data_split/
(official train/val/test) and dataset3/ (pre-split source). All 124 test samples appear
verbatim in dataset3/. The upload script excludes dataset3/ entirely, giving a clean
train=694 / test=124 split with zero overlap.
Training
Identical protocol to the CASL baseline.
| Setting | Value |
|---|---|
| Optimiser | AdamW (lr=3e-4, wd=1e-4) |
| LR schedule | OneCycleLR cosine |
| Max epochs | 60 |
| Batch size | 64 |
| Loss | CrossEntropy + label_smoothing=0.1 |
| Early stopping | patience=12 on val acc |
| Normalisation | Per-feature z-score (stats stored in checkpoint) |
Results
| Metric | Value |
|---|---|
| Best validation accuracy | 100% |
| Best checkpoint epoch | 3 |
| Final epoch (early stop) | 15 |
| Number of classes | 4 |
The 100% result is expected: 4 highly phonologically distinct signs, ~173 training samples per class, and a well-regularised model. This result validates the pipeline; it does not benchmark KSL at meaningful scale.
Checkpoint Contents
import torch
ck = torch.load("pytorch_model.bin", map_location="cpu")
# Keys: epoch, val_acc, model (state_dict), l2i (label→index dict),
# mean (tensor 225,), std (tensor 225,)
# l2i = {"father": 0, "hello": 1, "is": 2, "my": 3}
Limitations
- Only 4 classes — not a meaningful KSL benchmark.
- Pose dims are always zero (hand-only source data). Re-extraction with MediaPipe Holistic is planned.
- Small dataset; results will change with more data.
Citation / Project
AfriSignEncoder research project, CMU, 2026. GitHub: africansl_encoder.
Collection including luciayen/afrisign-exp1-ksl-baseline
Evaluation results
- Validation Accuracy on KSL-Hand-Landmarksself-reported1.000