Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +80 -0
config.yaml +40 -0
pytorch_model.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+license: mit
+tags:
+- audio
+library_name: pytorch
+---
+# Vocos
+Note: This repo has no affiliation with the author of Vocos.
+## What is this?
+This is a pretrained Vocos model similar to the official ones, except for having been trained to reconstruct audio in 48kHz, as opposed to 24kHz.
+Its purpose is to serve as a general high quality vocoder, but also as a building block for TTS models.
+## Usage
+Make sure the Vocos library is installed:
+```bash
+pip install vocos
+```
+then, load the model as usual:
+```python
+from vocos import Vocos
+vocos = Vocos.from_pretrained("kittn/vocos-mel-48khz-alpha1")
+```
+For more detailed examples, see [github.com/charactr-platform/vocos#usage](https://github.com/charactr-platform/vocos#usage)
+## What is Vocos?
+Here's a summary from the official repo [[link](https://github.com/charactr-platform/vocos)]:
+> Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
+For more details and other variants, check out the repo link above.
+## Model summary
+```bash
+=================================================================
+Layer (type:depth-idx)                   Param #
+=================================================================
+Vocos                                    --
+├─MelSpectrogramFeatures: 1-1            --
+│    └─MelSpectrogram: 2-1               --
+│    │    └─Spectrogram: 3-1             --
+│    │    └─MelScale: 3-2                --
+├─VocosBackbone: 1-2                     --
+│    └─Conv1d: 2-2                       918,528
+│    └─LayerNorm: 2-3                    2,048
+│    └─ModuleList: 2-4                   --
+│    │    └─ConvNeXtBlock: 3-3           4,208,640
+│    │    └─ConvNeXtBlock: 3-4           4,208,640
+│    │    └─ConvNeXtBlock: 3-5           4,208,640
+│    │    └─ConvNeXtBlock: 3-6           4,208,640
+│    │    └─ConvNeXtBlock: 3-7           4,208,640
+│    │    └─ConvNeXtBlock: 3-8           4,208,640
+│    │    └─ConvNeXtBlock: 3-9           4,208,640
+│    │    └─ConvNeXtBlock: 3-10          4,208,640
+│    └─LayerNorm: 2-5                    2,048
+├─ISTFTHead: 1-3                         --
+│    └─Linear: 2-6                       2,101,250
+│    └─ISTFT: 2-7                        --
+=================================================================
+Total params: 36,692,994
+Trainable params: 36,692,994
+Non-trainable params: 0
+=================================================================
+```
+## Evals
+TODO
+## Training details
+TODO

config.yaml ADDED Viewed

	@@ -0,0 +1,40 @@

+backbone:
+  class_path: vocos.models.VocosBackbone
+  init_args:
+    adanorm_num_embeddings: null
+    dim: 1024
+    input_channels: 128
+    intermediate_dim: 2048
+    layer_scale_init_value: null
+    num_layers: 8
+decay_mel_coeff: false
+enable_discriminator: true
+evaluate_periodicty: true
+evaluate_pesq: true
+evaluate_utmos: true
+feature_extractor:
+  class_path: vocos.feature_extractors.MelSpectrogramFeatures
+  init_args:
+    hop_length: 256
+    n_fft: 2048
+    n_mels: 128
+    padding: center
+    sample_rate: 48000
+generator_period: 3
+grad_acc: 1
+head:
+  class_path: vocos.heads.ISTFTHead
+  init_args:
+    dim: 1024
+    hop_length: 256
+    n_fft: 2048
+    padding: center
+initial_learning_rate: 0.0003
+mel_loss_coeff: 15.0
+mrd_loss_coeff: 0.1
+num_warmup_steps: 500
+pretrain_decoupled_steps: 0
+pretrain_disc_steps: 500
+pretrain_mel_steps: 0
+pretrained_ckpt: null
+sample_rate: 48000

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3315c87d130922dff1c4c0cfd153ac3ef037950ac0eba13f355bb38cbda46fc2
+size 147342055