Vocos

Note: This repo has no affiliation with the author of Vocos.

Pretrained Vocos model with a 48kHz sampling rate, as opposed to 24kHz of the official.

Usage

Make sure the Vocos library is installed:

pip install vocos

then, load the model as usual:

from vocos import Vocos
vocos = Vocos.from_pretrained("kittn/vocos-mel-48khz-alpha1")

For more detailed examples, see github.com/charactr-platform/vocos#usage

Evals

TODO

Training details

TODO

What is Vocos?

Here's a summary from the official repo [link]:

Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.

For more details and other variants, check out the repo link above.

Model summary

=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
Vocos                                    --
├─MelSpectrogramFeatures: 1-1            --
│    └─MelSpectrogram: 2-1               --
│    │    └─Spectrogram: 3-1             --
│    │    └─MelScale: 3-2                --
├─VocosBackbone: 1-2                     --
│    └─Conv1d: 2-2                       918,528
│    └─LayerNorm: 2-3                    2,048
│    └─ModuleList: 2-4                   --
│    │    └─ConvNeXtBlock: 3-3           4,208,640
│    │    └─ConvNeXtBlock: 3-4           4,208,640
│    │    └─ConvNeXtBlock: 3-5           4,208,640
│    │    └─ConvNeXtBlock: 3-6           4,208,640
│    │    └─ConvNeXtBlock: 3-7           4,208,640
│    │    └─ConvNeXtBlock: 3-8           4,208,640
│    │    └─ConvNeXtBlock: 3-9           4,208,640
│    │    └─ConvNeXtBlock: 3-10          4,208,640
│    └─LayerNorm: 2-5                    2,048
├─ISTFTHead: 1-3                         --
│    └─Linear: 2-6                       2,101,250
│    └─ISTFT: 2-7                        --
=================================================================
Total params: 36,692,994
Trainable params: 36,692,994
Non-trainable params: 0
=================================================================