ONNX
File size: 1,399 Bytes
25089d7
 
 
d26821c
65fa8be
d26821c
 
 
65fa8be
d26821c
 
 
5984cba
 
 
 
 
 
 
 
 
 
 
 
 
 
d26821c
 
367eb6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
license: apache-2.0
---

A [SoundStream](https://arxiv.org/abs/2107.03312) decoder to reconstruct audio from a mel-spectrogram.

## Overview

This model is a SoundStream decoder which inverts mel-spectrograms computed with the specific hyperparameters defined in the example below. This model was trained on music data and used in [Multi-instrument Music Synthesis with Spectrogram Diffusion](https://arxiv.org/abs/2206.05408) (ISMIR 2022).

A typical use-case is to simplify music generation by predicting mel-spectrograms (instead of a raw waveform), and then use this model to reconstruct audio.

If you use it, please consider citing:

```bibtex
@article{zeghidour2021soundstream,
  title={Soundstream: An end-to-end neural audio codec},
  author={Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={30},
  pages={495--507},
  year={2021},
  publisher={IEEE}
}
```

## Example Use

```python
from diffusers import OnnxRuntimeModel


SAMPLE_RATE = 16000
N_FFT = 1024
HOP_LENGTH = 320
WIN_LENGTH = 640
N_MEL_CHANNELS = 128
MEL_FMIN = 0.0
MEL_FMAX = int(SAMPLE_RATE // 2)
CLIP_VALUE_MIN = 1e-5
CLIP_VALUE_MAX = 1e8

mel = ...

melgan = OnnxRuntimeModel.from_pretrained("kashif/soundstream_mel_decoder")

audio = melgan(input_features=mel.astype(np.float32))
```