kittn commited on
Commit
02354d1
1 Parent(s): cb0e2ac

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +80 -0
  2. config.yaml +40 -0
  3. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ library_name: pytorch
6
+ ---
7
+
8
+ # Vocos
9
+
10
+ Note: This repo has no affiliation with the author of Vocos.
11
+
12
+ ## What is this?
13
+
14
+ This is a pretrained Vocos model similar to the official ones, except for having been trained to reconstruct audio in 48kHz, as opposed to 24kHz.
15
+
16
+ Its purpose is to serve as a general high quality vocoder, but also as a building block for TTS models.
17
+
18
+ ## Usage
19
+ Make sure the Vocos library is installed:
20
+
21
+ ```bash
22
+ pip install vocos
23
+ ```
24
+
25
+ then, load the model as usual:
26
+
27
+ ```python
28
+ from vocos import Vocos
29
+ vocos = Vocos.from_pretrained("kittn/vocos-mel-48khz-alpha1")
30
+ ```
31
+
32
+ For more detailed examples, see [github.com/charactr-platform/vocos#usage](https://github.com/charactr-platform/vocos#usage)
33
+
34
+
35
+ ## What is Vocos?
36
+
37
+ Here's a summary from the official repo [[link](https://github.com/charactr-platform/vocos)]:
38
+
39
+ > Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
40
+
41
+ For more details and other variants, check out the repo link above.
42
+
43
+ ## Model summary
44
+ ```bash
45
+ =================================================================
46
+ Layer (type:depth-idx) Param #
47
+ =================================================================
48
+ Vocos --
49
+ ├─MelSpectrogramFeatures: 1-1 --
50
+ │ └─MelSpectrogram: 2-1 --
51
+ │ │ └─Spectrogram: 3-1 --
52
+ │ │ └─MelScale: 3-2 --
53
+ ├─VocosBackbone: 1-2 --
54
+ │ └─Conv1d: 2-2 918,528
55
+ │ └─LayerNorm: 2-3 2,048
56
+ │ └─ModuleList: 2-4 --
57
+ │ │ └─ConvNeXtBlock: 3-3 4,208,640
58
+ │ │ └─ConvNeXtBlock: 3-4 4,208,640
59
+ │ │ └─ConvNeXtBlock: 3-5 4,208,640
60
+ │ │ └─ConvNeXtBlock: 3-6 4,208,640
61
+ │ │ └─ConvNeXtBlock: 3-7 4,208,640
62
+ │ │ └─ConvNeXtBlock: 3-8 4,208,640
63
+ │ │ └─ConvNeXtBlock: 3-9 4,208,640
64
+ │ │ └─ConvNeXtBlock: 3-10 4,208,640
65
+ │ └─LayerNorm: 2-5 2,048
66
+ ├─ISTFTHead: 1-3 --
67
+ │ └─Linear: 2-6 2,101,250
68
+ │ └─ISTFT: 2-7 --
69
+ =================================================================
70
+ Total params: 36,692,994
71
+ Trainable params: 36,692,994
72
+ Non-trainable params: 0
73
+ =================================================================
74
+ ```
75
+
76
+ ## Evals
77
+ TODO
78
+
79
+ ## Training details
80
+ TODO
config.yaml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ backbone:
2
+ class_path: vocos.models.VocosBackbone
3
+ init_args:
4
+ adanorm_num_embeddings: null
5
+ dim: 1024
6
+ input_channels: 128
7
+ intermediate_dim: 2048
8
+ layer_scale_init_value: null
9
+ num_layers: 8
10
+ decay_mel_coeff: false
11
+ enable_discriminator: true
12
+ evaluate_periodicty: true
13
+ evaluate_pesq: true
14
+ evaluate_utmos: true
15
+ feature_extractor:
16
+ class_path: vocos.feature_extractors.MelSpectrogramFeatures
17
+ init_args:
18
+ hop_length: 256
19
+ n_fft: 2048
20
+ n_mels: 128
21
+ padding: center
22
+ sample_rate: 48000
23
+ generator_period: 3
24
+ grad_acc: 1
25
+ head:
26
+ class_path: vocos.heads.ISTFTHead
27
+ init_args:
28
+ dim: 1024
29
+ hop_length: 256
30
+ n_fft: 2048
31
+ padding: center
32
+ initial_learning_rate: 0.0003
33
+ mel_loss_coeff: 15.0
34
+ mrd_loss_coeff: 0.1
35
+ num_warmup_steps: 500
36
+ pretrain_decoupled_steps: 0
37
+ pretrain_disc_steps: 500
38
+ pretrain_mel_steps: 0
39
+ pretrained_ckpt: null
40
+ sample_rate: 48000
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3315c87d130922dff1c4c0cfd153ac3ef037950ac0eba13f355bb38cbda46fc2
3
+ size 147342055