--- license: mit tags: - audio library_name: pytorch --- # Vocos #### Note: This repo has no affiliation with the author of Vocos. Pretrained Vocos model with a 48kHz sampling rate, as opposed to 24kHz of the official. ## Usage Make sure the Vocos library is installed: ```bash pip install vocos ``` then, load the model as usual: ```python from vocos import Vocos vocos = Vocos.from_pretrained("kittn/vocos-mel-48khz-alpha1") ``` For more detailed examples, see [github.com/charactr-platform/vocos#usage](https://github.com/charactr-platform/vocos#usage) ## Evals TODO ## Training details TODO ## What is Vocos? Here's a summary from the official repo [[link](https://github.com/charactr-platform/vocos)]: > Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform. For more details and other variants, check out the repo link above. ## Model summary ```bash ================================================================= Layer (type:depth-idx) Param # ================================================================= Vocos -- ├─MelSpectrogramFeatures: 1-1 -- │ └─MelSpectrogram: 2-1 -- │ │ └─Spectrogram: 3-1 -- │ │ └─MelScale: 3-2 -- ├─VocosBackbone: 1-2 -- │ └─Conv1d: 2-2 918,528 │ └─LayerNorm: 2-3 2,048 │ └─ModuleList: 2-4 -- │ │ └─ConvNeXtBlock: 3-3 4,208,640 │ │ └─ConvNeXtBlock: 3-4 4,208,640 │ │ └─ConvNeXtBlock: 3-5 4,208,640 │ │ └─ConvNeXtBlock: 3-6 4,208,640 │ │ └─ConvNeXtBlock: 3-7 4,208,640 │ │ └─ConvNeXtBlock: 3-8 4,208,640 │ │ └─ConvNeXtBlock: 3-9 4,208,640 │ │ └─ConvNeXtBlock: 3-10 4,208,640 │ └─LayerNorm: 2-5 2,048 ├─ISTFTHead: 1-3 -- │ └─Linear: 2-6 2,101,250 │ └─ISTFT: 2-7 -- ================================================================= Total params: 36,692,994 Trainable params: 36,692,994 Non-trainable params: 0 ================================================================= ```