--- license: apache-2.0 --- A [SoundStream](https://arxiv.org/abs/2107.03312) decoder to reconstruct audio from a mel-spectrogram. ## Overview This model is a SoundStream decoder which inverts mel-spectrograms computed with the specific hyperparameters defined in the example below. This model was trained on music data and used in [Multi-instrument Music Synthesis with Spectrogram Diffusion](https://arxiv.org/abs/2206.05408) (ISMIR 2022). A typical use-case is to simplify music generation by predicting mel-spectrograms (instead of a raw waveform), and then use this model to reconstruct audio. If you use it, please consider citing: ```bibtex @article{zeghidour2021soundstream, title={Soundstream: An end-to-end neural audio codec}, author={Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={495--507}, year={2021}, publisher={IEEE} } ``` ## Example Use ```python from diffusers import OnnxRuntimeModel SAMPLE_RATE = 16000 N_FFT = 1024 HOP_LENGTH = 320 WIN_LENGTH = 640 N_MEL_CHANNELS = 128 MEL_FMIN = 0.0 MEL_FMAX = int(SAMPLE_RATE // 2) CLIP_VALUE_MIN = 1e-5 CLIP_VALUE_MAX = 1e8 mel = ... melgan = OnnxRuntimeModel.from_pretrained("kashif/soundstream_mel_decoder") audio = melgan(input_features=mel.astype(np.float32)) ```