projecte-aina
/

alvocat-vocos-22khz

PyTorch

ONNX

vocoder

vocos

tts

Model card Files Files and versions Community

wetdog commited on Mar 26

Commit

7724eae

•

1 Parent(s): 5801792

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ datasets:
 - projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
-# Vocos-mel-22khz
 <!-- Provide a quick summary of what the model is/does. -->
@@ -22,12 +22,13 @@ Unlike other typical GAN-based vocoders, Vocos does not model audio samples in t
 Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through
 inverse Fourier transform.
-This version of vocos uses 80-bin mel spectrograms as acoustic features which are widespread
 in the TTS domain since the introduction of [hifi-gan](https://github.com/jik876/hifi-gan/blob/master/meldataset.py)
 The goal of this model is to provide an alternative to hifi-gan that is faster and compatible with the
-acoustic output of several TTS models.
 ## Intended Uses and limitations
@@ -79,6 +80,7 @@ We also release a onnx version of the model, you can check in colab:
 <a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
 ## Training Details
 ### Training Data
@@ -98,7 +100,7 @@ The model was trained on 3 Catalan speech datasets
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-The model was trained for 1M steps and 1k epochs with a batch size of 16 for stability. We used a Cosine scheduler with a initial learning rate of 5e-4.
 We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
@@ -116,7 +118,7 @@ We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 inst
 <!-- This section describes the evaluation protocols and provides the results. -->
-Evaluation was done using the metrics on the original repo, after ~ 1000 epochs we achieve:
 * val_loss: 3.57
 * f1_score: 0.95

 - projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
+# Vocos-mel-22khz-cat
 <!-- Provide a quick summary of what the model is/does. -->
 Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through
 inverse Fourier transform.
+This version of **Vocos** uses 80-bin mel spectrograms as acoustic features which are widespread
 in the TTS domain since the introduction of [hifi-gan](https://github.com/jik876/hifi-gan/blob/master/meldataset.py)
 The goal of this model is to provide an alternative to hifi-gan that is faster and compatible with the
+acoustic output of several TTS models. This version is tailored for the Catalan language,
+as it was trained only on Catalan speech datasets.
+We are grateful with the authors for open sourcing the code allowing us to modify and train this version.
 ## Intended Uses and limitations
 <a target="_blank" href="https://colab.research.google.com/github/langtech-bsc/vocos/blob/matcha/notebooks/vocos_22khz_onnx_inference.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
 ## Training Details
 ### Training Data
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The model was trained for 1.5M steps and 1.3k epochs with a batch size of 16 for stability. We used a Cosine scheduler with a initial learning rate of 5e-4.
 We also modified the mel spectrogram loss to use 128 bins and fmax of 11025 instead of the same input mel spectrogram.
 <!-- This section describes the evaluation protocols and provides the results. -->
+Evaluation was done using the metrics on the [original repo](https://github.com/gemelo-ai/vocos), after ~ 1000 epochs we achieve:
 * val_loss: 3.57
 * f1_score: 0.95