File size: 1,399 Bytes
25089d7 d26821c 65fa8be d26821c 65fa8be d26821c 5984cba d26821c 367eb6e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: apache-2.0
---
A [SoundStream](https://arxiv.org/abs/2107.03312) decoder to reconstruct audio from a mel-spectrogram.
## Overview
This model is a SoundStream decoder which inverts mel-spectrograms computed with the specific hyperparameters defined in the example below. This model was trained on music data and used in [Multi-instrument Music Synthesis with Spectrogram Diffusion](https://arxiv.org/abs/2206.05408) (ISMIR 2022).
A typical use-case is to simplify music generation by predicting mel-spectrograms (instead of a raw waveform), and then use this model to reconstruct audio.
If you use it, please consider citing:
```bibtex
@article{zeghidour2021soundstream,
title={Soundstream: An end-to-end neural audio codec},
author={Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume={30},
pages={495--507},
year={2021},
publisher={IEEE}
}
```
## Example Use
```python
from diffusers import OnnxRuntimeModel
SAMPLE_RATE = 16000
N_FFT = 1024
HOP_LENGTH = 320
WIN_LENGTH = 640
N_MEL_CHANNELS = 128
MEL_FMIN = 0.0
MEL_FMAX = int(SAMPLE_RATE // 2)
CLIP_VALUE_MIN = 1e-5
CLIP_VALUE_MAX = 1e8
mel = ...
melgan = OnnxRuntimeModel.from_pretrained("kashif/soundstream_mel_decoder")
audio = melgan(input_features=mel.astype(np.float32))
``` |