SEMamba++ (Interspeech 2026 · Long Paper Track)
Official Hugging Face repository for SEMamba++. [Demo] [Paper (arXiv)] [Github]
SEMamba++ is a general speech restoration (GSR) framework that leverages global, local, and periodic spectral patterns via a Mamba-based architecture. It handles a range of degradation conditions including noise, reverberation, and clipping.
Prerequisites
Install all required dependencies:
pip install -r requirements.txt
For the Mamba backbone, follow the installation guide from SEMamba, which resolves CUDA-specific build issues.
Datasets
SEMamba++ can be trained on any dataset that provides speech, noise, and room impulse response (RIR) samples. Point each split to the corresponding JSON manifest file:
| Split | File |
|---|---|
| Training speech | data/train_speech.json |
| Training noise | data/train_noise.json |
| Training RIR | data/train_rir.json |
| Validation (clean) | data/val_clean.json |
| Validation (degraded) | data/val_degraded.json |
Download sources
- Speech: VCTK, LibriTTS
- Noise: DNS Challenge 2020, WHAM!
- RIR: Arni, DNS5
Pretrained weights
Pretrained weights will be released on HuggingFace.
The released model was trained on VCTK and LibriTTS (~500 hours of speech combined).
References
- SEMamba — Mamba-based speech enhancement backbone
- BigVGAN — Neural vocoder (NVIDIA)
- MPSENet — Multi-scale phase-aware speech enhancement
Citation
If you find SEMamba++ useful in your work, please cite:
@misc{lee2026semambageneralspeechrestoration,
title = {SEMamba++: A General Speech Restoration Framework
Leveraging Global, Local, and Periodic Spectral Patterns},
author = {Yongjoon Lee and Jung-Woo Choi},
year = {2026},
eprint = {2603.11669},
archivePrefix = {arXiv},
primaryClass = {eess.AS},
url = {https://arxiv.org/abs/2603.11669}
}