Instructions to use multimodalart/resemble-enhance-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use multimodalart/resemble-enhance-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir resemble-enhance-mlx multimodalart/resemble-enhance-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
resemble-enhance-mlx
Apple MLX port of resemble-enhance (speech denoising + enhancement). Runs entirely on the MLX runtime — no PyTorch dependency at inference time.
The weights are converted from the original resemble-enhance checkpoint and stored
as a single fp16 safetensors (710 MB). Every component (mel frontend, spectral
denoiser UNet, latent conditional-flow-matching enhancer, UnivNet/LVCNet vocoder) was
ported op-by-op and validated against the PyTorch reference.
Install
pip install mlx numpy scipy soundfile huggingface_hub
Usage
import soundfile as sf
from huggingface_hub import hf_hub_download
import resemble_enhance_mlx as re
ckpt = hf_hub_download("multimodalart/resemble-enhance-mlx", "resemble-enhance-mlx.safetensors")
model = re.load(ckpt, lambd=0.5) # lambd in [0,1]: denoiser strength
wav, sr = sf.read("noisy.wav", dtype="float32")
out, out_sr = re.enhance(model, wav, sr) # resamples to 44.1 kHz, chunks long audio
sf.write("clean.wav", out, out_sr)
Place resemble_enhance_mlx/ (the package) next to your script, or add this repo to
PYTHONPATH.
Parity & speed
Validated against the PyTorch reference (max abs diff, random-weight and full-model):
| Component | vs PyTorch |
|---|---|
| Mel frontend | 2.8e-5 |
| UnivNet / LVCNet vocoder | 7.2e-7 |
| Denoiser (STFT + 2D UNet + ISTFT) | 1.6e-4 |
| Latent CFM enhancer (IRMAE + WN + ODE) | 9.2e-6 |
| End-to-end waveform | corr 0.99999 |
fp16 storage vs fp32 compute: corr 1.000000, mean diff 4.5e-5 (inaudible).
On an M4 Max, a 6.5 s clip enhances in ~1.7 s (nfe=64), ~7.5× faster than the PyTorch reference on CPU.
Notes
- Input audio is resampled to 44.1 kHz with a polyphase (kaiser) filter; this is a
close approximation of torchaudio's
sinc_interp_kaiser, so non-44.1 kHz input is not bit-identical to the reference. lambd=0skips the denoiser (enhancement only);lambd=0.5is the default.
Original model and method: Resemble AI. This repository only re-packages the weights for MLX.
Quantized