Update General Model Description
Browse filesUpdating Matcha description. Also adding Vocos model description
about.md
CHANGED
@@ -18,13 +18,18 @@ Here you'll be able to find all the information regarding our model, which has b
|
|
18 |
|
19 |
## General Model Description
|
20 |
|
21 |
-
**Matcha-TTS** is
|
22 |
-
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
27 |
-
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
28 |
|
29 |
## Adaptation to Catalan
|
30 |
|
|
|
18 |
|
19 |
## General Model Description
|
20 |
|
21 |
+
**Matcha-TTS** is a non-autorregressive encoder-decoder model designed for fast acoustic modelling in TTS.
|
22 |
+
The encoder part processes input sequences of phonemes and, together with a phoneme duration predictor, outputs averaged acoustic features. And the decoder,
|
23 |
+
which is essentially a U-Net backbone based on the Transfomer architecture, predicts the refined spectrogram.
|
24 |
+
The model is trained with optimal-transport conditional flow matching.
|
25 |
+
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps.
|
26 |
+
|
27 |
+
**Vocos** is a fast neural vocoder designed to synthesize audio waveforms from acoustic features.
|
28 |
+
Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain.
|
29 |
+
Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
|
30 |
+
The goal of this model is to provide an alternative to hifi-gan that is faster and compatible with the acoustic output of several TTS models.
|
31 |
+
This version is tailored for the Catalan language, as it was trained only on Catalan speech datasets.
|
32 |
|
|
|
|
|
33 |
|
34 |
## Adaptation to Catalan
|
35 |
|