SEMamba++ (Interspeech 2026 · Long Paper Track)

Official Hugging Face repository for SEMamba++. [Demo] [Paper (arXiv)] [Github]

SEMamba++ is a general speech restoration (GSR) framework that leverages global, local, and periodic spectral patterns via a Mamba-based architecture. It handles a range of degradation conditions including noise, reverberation, and clipping.


Prerequisites

Install all required dependencies:

pip install -r requirements.txt

For the Mamba backbone, follow the installation guide from SEMamba, which resolves CUDA-specific build issues.


Datasets

SEMamba++ can be trained on any dataset that provides speech, noise, and room impulse response (RIR) samples. Point each split to the corresponding JSON manifest file:

Split File
Training speech data/train_speech.json
Training noise data/train_noise.json
Training RIR data/train_rir.json
Validation (clean) data/val_clean.json
Validation (degraded) data/val_degraded.json

Download sources


Pretrained weights

Pretrained weights will be released on HuggingFace.

The released model was trained on VCTK and LibriTTS (~500 hours of speech combined).


References

  • SEMamba — Mamba-based speech enhancement backbone
  • BigVGAN — Neural vocoder (NVIDIA)
  • MPSENet — Multi-scale phase-aware speech enhancement

Citation

If you find SEMamba++ useful in your work, please cite:

@misc{lee2026semambageneralspeechrestoration,
  title         = {SEMamba++: A General Speech Restoration Framework
                   Leveraging Global, Local, and Periodic Spectral Patterns},
  author        = {Yongjoon Lee and Jung-Woo Choi},
  year          = {2026},
  eprint        = {2603.11669},
  archivePrefix = {arXiv},
  primaryClass  = {eess.AS},
  url           = {https://arxiv.org/abs/2603.11669}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for yongjoonlee/semambapp