kittn's picture
Upload folder using huggingface_hub
abd1670
---
license: mit
tags:
- audio
library_name: pytorch
---
# Vocos
#### Note: This repo has no affiliation with the author of Vocos.
Pretrained Vocos model with a 48kHz sampling rate, as opposed to 24kHz of the official.
## Usage
Make sure the Vocos library is installed:
```bash
pip install vocos
```
then, load the model as usual:
```python
from vocos import Vocos
vocos = Vocos.from_pretrained("kittn/vocos-mel-48khz-alpha1")
```
For more detailed examples, see [github.com/charactr-platform/vocos#usage](https://github.com/charactr-platform/vocos#usage)
## Evals
TODO
## Training details
TODO
## What is Vocos?
Here's a summary from the official repo [[link](https://github.com/charactr-platform/vocos)]:
> Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
For more details and other variants, check out the repo link above.
## Model summary
```bash
=================================================================
Layer (type:depth-idx) Param #
=================================================================
Vocos --
├─MelSpectrogramFeatures: 1-1 --
│ └─MelSpectrogram: 2-1 --
│ │ └─Spectrogram: 3-1 --
│ │ └─MelScale: 3-2 --
├─VocosBackbone: 1-2 --
│ └─Conv1d: 2-2 918,528
│ └─LayerNorm: 2-3 2,048
│ └─ModuleList: 2-4 --
│ │ └─ConvNeXtBlock: 3-3 4,208,640
│ │ └─ConvNeXtBlock: 3-4 4,208,640
│ │ └─ConvNeXtBlock: 3-5 4,208,640
│ │ └─ConvNeXtBlock: 3-6 4,208,640
│ │ └─ConvNeXtBlock: 3-7 4,208,640
│ │ └─ConvNeXtBlock: 3-8 4,208,640
│ │ └─ConvNeXtBlock: 3-9 4,208,640
│ │ └─ConvNeXtBlock: 3-10 4,208,640
│ └─LayerNorm: 2-5 2,048
├─ISTFTHead: 1-3 --
│ └─Linear: 2-6 2,101,250
│ └─ISTFT: 2-7 --
=================================================================
Total params: 36,692,994
Trainable params: 36,692,994
Non-trainable params: 0
=================================================================
```