Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,33 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library: ONNX
|
4 |
+
base_model: charactr/vocos-mel-24khz
|
5 |
---
|
6 |
+
|
7 |
+
**Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis**
|
8 |
+
|
9 |
+
**Audio samples | Paper [abs] [pdf]**
|
10 |
+
|
11 |
+
Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
|
12 |
+
|
13 |
+
This is a ONNX version of the original mel spectrogram model. The model predicts the spectrogram and the ISTFT is performed outside ONNX as ISTFT is still not implemented as an operator in ONNX.
|
14 |
+
|
15 |
+
## Usage
|
16 |
+
|
17 |
+
Try out in colab:
|
18 |
+
|
19 |
+
<a target="_blank" href="https://colab.research.google.com/drive/1J1tWd56D7CPwmVCP-pbMNzlRWYvlyADN">
|
20 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
21 |
+
</a>
|
22 |
+
|
23 |
+
## Citation
|
24 |
+
|
25 |
+
```
|
26 |
+
@article{siuzdak2023vocos,
|
27 |
+
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
|
28 |
+
author={Siuzdak, Hubert},
|
29 |
+
journal={arXiv preprint arXiv:2306.00814},
|
30 |
+
year={2023}
|
31 |
+
}
|
32 |
+
|
33 |
+
```
|