AfriSignEncoder Exp1 — KSL Baseline (LandmarkTransformer)

Part of the AfriSignEncoder research project: a multilingual African sign language recognition benchmark. This checkpoint is the Experiment 1 single-language baseline for Kenyan Sign Language (KSL).

Scope note: This baseline covers only 4 KSL glosses (father, hello, is, my) from a small demo-scale dataset. The 100% accuracy reflects the simplicity of the 4-class task and should not be generalised to full KSL vocabulary recognition.

Model Description

Same LandmarkTransformer architecture as the CASL baseline, with a 4-class head.

Component	Value
Architecture	LandmarkTransformer (custom)
Input	(B, 64, 225) float32 — 75 keypoints × 3 coords per frame
Embedding dim	256
Attention heads	8
Encoder layers	4
Feed-forward dim	1,024
Positional encoding	Learned
Classification head	Linear 256 → 4
Parameters	~3.24 M

Dataset

KSL Hand Landmarks — 4 Kenyan Sign Language words.

Split	Samples	Classes
Train	694	4
Test	124	4

Source: Kaggle joanwachuka/ksl-hand-landmarks → parquet at luciayen/KSL-Hand-Landmarks.

Landmark caveat: The original .npy files contain MediaPipe Hands-only keypoints (42 joints × 3 = 126D). These are placed in dimensions [0:126]; dimensions [126:225] (the 99 pose body dims) are zero-padded. The model therefore learns from hand shape and motion only; the padded dimensions are always zero and contribute no signal.

Data leakage fix: The Kaggle archive contains two sub-directories — data_split/ (official train/val/test) and dataset3/ (pre-split source). All 124 test samples appear verbatim in dataset3/. The upload script excludes dataset3/ entirely, giving a clean train=694 / test=124 split with zero overlap.

Training

Identical protocol to the CASL baseline.

Setting	Value
Optimiser	AdamW (lr=3e-4, wd=1e-4)
LR schedule	OneCycleLR cosine
Max epochs	60
Batch size	64
Loss	CrossEntropy + label_smoothing=0.1
Early stopping	patience=12 on val acc
Normalisation	Per-feature z-score (stats stored in checkpoint)

Results

Metric	Value
Best validation accuracy	100%
Best checkpoint epoch	3
Final epoch (early stop)	15
Number of classes	4

The 100% result is expected: 4 highly phonologically distinct signs, ~173 training samples per class, and a well-regularised model. This result validates the pipeline; it does not benchmark KSL at meaningful scale.

Checkpoint Contents

import torch
ck = torch.load("pytorch_model.bin", map_location="cpu")
# Keys: epoch, val_acc, model (state_dict), l2i (label→index dict),
#        mean (tensor 225,), std (tensor 225,)
# l2i = {"father": 0, "hello": 1, "is": 2, "my": 3}

Limitations

Only 4 classes — not a meaningful KSL benchmark.
Pose dims are always zero (hand-only source data). Re-extraction with MediaPipe Holistic is planned.
Small dataset; results will change with more data.

Citation / Project

AfriSignEncoder research project, CMU, 2026. GitHub: africansl_encoder.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including luciayen/afrisign-exp1-ksl-baseline

AfriSignEncoder Exp1 Single-Language Baselines

Collection

Landmark-only LandmarkTransformer baselines. CASL: 71.9% (60 cls). KSL: 100% (4 cls demo). AfriSignEncoder multilingual benchmark. • 2 items • Updated 13 days ago

Evaluation results

Validation Accuracy on KSL-Hand-Landmarks
self-reported

1.000