aoxo
/

swaram

@@ -10,11 +10,11 @@ pipeline_tag: text-to-speech
 # Malayalam Text-to-Speech
-This repository contains the **Malayalam (mal)** language text-to-speech (TTS) model checkpoint.
 ## Model Details
-Swaram (**S**tochastic **W**aveform **A**daptive **R**ecurrent **A**utoencoder for **M**alayalam) is an advanced speech synthesis model that generates speech waveforms conditioned on input text sequences. It is based on a conditional variational autoencoder (VAE) architecture.
 The model's text encoder is built on Wav2Vec2 decoder, while the decoder is a VAE. A flow-based module predicts spectrogram-based acoustic features, which is composed of the Transformer-based Contextualizer and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of transposed convolutional layers. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.

 # Malayalam Text-to-Speech
+This repository contains the **Swaram (mal)** text-to-speech (TTS) model checkpoint.
 ## Model Details
+**Swaram** (**S**tochastic **W**aveform **A**daptive **R**ecurrent **A**utoencoder for **M**alayalam) is an advanced speech synthesis model that generates speech waveforms conditioned on input text sequences. It is based on a conditional variational autoencoder (VAE) architecture.
 The model's text encoder is built on Wav2Vec2 decoder, while the decoder is a VAE. A flow-based module predicts spectrogram-based acoustic features, which is composed of the Transformer-based Contextualizer and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of transposed convolutional layers. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.