S4 State Space Model — Speech Commands

A 3-layer S4 state-space model (Gu et al. 2022) trained from scratch on the Google Speech Commands v2 dataset for 10-class keyword classification. Released as xaitalk's cross-framework XAI demo on the SSM architecture family.

This is the first publicly-released S4 checkpoint paired with a full cross-framework XAI validation suite — including LRP on SSMs which, as far as we are aware, no other library implements.

Files

File Format Size
ssm_speech_commands.pt PyTorch state_dict ~6 MB
ssm_speech_commands_config.json architecture config < 1 KB

Architecture

Property Value
Input dim 1 (raw waveform)
State dim 64
Hidden dim 128
Layers 3
Downsample stride 160
Input length 16000 (= 1 second at 16 kHz)
Output 10-class logits

Training: 50 epochs on Speech Commands v2 with cosine-annealing LR schedule (A100 GPU). Best validation accuracy 73.5%, test accuracy 71.2%.

Cross-framework verification

These weights are validated by xaitalk's ssm benchmark on Speech Commands (22 methods):

Methods Passing at r ≥ 0.95 Min(min_r) Verified
22 22/22 1.0000 2026-05-09

Every method (gradient family, full LRP family including ε / γ / α-β / z+ / flat / w² / SIGN variants, DeepLIFT, smoothgrad family, GradCAM, occlusion) produces bit-exact identical attributions across PyTorch / TensorFlow / JAX at the worst-case Pearson r = 1.0000.

This is the strongest cross-framework verification of any architecture in xaitalk's matrix and the canonical reference for ports to new architectures.

Usage

from xaitalk.hub import ensure_model
import torch, json

ckpt_path  = ensure_model('ssm/s4-speech-commands')
config_path = ckpt_path.parent / 'ssm_speech_commands_config.json'
config = json.loads(config_path.read_text())

# Architecture class lives in xaitalk
from xaitalk.models import S4SSM
model = S4SSM(**config)
model.load_state_dict(torch.load(ckpt_path, weights_only=True))
model.eval()

# Run XAI on a 1-second waveform
import xaitalk
import numpy as np
x = np.random.randn(1, 1, 16000).astype(np.float32)
expl = xaitalk.explain(model, x, method='lrp_epsilon', target_class=3)

Training data

Speech Commands v2 — Warden 2018. 35-keyword dataset; here trained on the 10-keyword subset (yes, no, up, down, left, right, on, off, stop, go).

License

Apache 2.0. Speech Commands is released under CC-BY 4.0.

Citation

S4 architecture:

@inproceedings{gu2022s4,
  author    = {Gu, Albert and Goel, Karan and R{\'e}, Christopher},
  title     = {Efficiently Modeling Long Sequences with Structured State Spaces},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2022}
}

Speech Commands dataset:

@misc{warden2018speechcommands,
  author = {Warden, Pete},
  title  = {Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition},
  year   = {2018},
  eprint = {1804.03209}
}

xaitalk infrastructure:

@software{paul2026xaitalk,
  author = {Paul, Alexander},
  title  = {xaitalk: Cross-Framework Explainable AI Library},
  year   = {2026},
  url    = {https://xaitalk.com}
}

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train xaitalk/ssm-s4-speech-commands

Paper for xaitalk/ssm-s4-speech-commands