KinyarwandaTTS / README.md
Kleber's picture
Update README.md
8754f8a
# Grapheme-based statistical parametric synthesizer for Kinyarwanda
A Grapheme-based approach was chosen because they give acceptable performances for low-resource languages. For instance, this model was trained on approximately 5 hours of Kinyarwanda audios with their corresponding transcriptions, no further language-specific information was provided.
The [Festvox](http://festvox.org/) suite of tools was employed to build the model, and the Flite engine was used to generate a small, and portable executable file for this model. Currently, this model can only be run on Linux.
## Model description
To build the voice, we needed to map graphemes to their corresponding phonemes. In this work the UniTran-based approach to building the voice. The graphemes are converted to UTF-8 code points, then these are converted to guessed phonetic transcription in X-Sampa. After obtaining the phonemes, on each one of them we use an HMM model from the Clustergen framework to obtain important features. These features are then used to train RandomForest(20 decision trees) to predict spectral features. It achieves an `MCD` of ` 5.03 `.
## Limitations and Recommendations
The voice produced lacks in crispness and in some cases ignore tonal information which is indispensable in Kinyarwanda. We believe that with a large corpus of linguistic information the voice would sound more natural.
## Usage
Use the following to convert text to a wav file:
``` sh
./flite_du_kin_tts -f kinyarwanda.txt kinyarwanda.wav
```
And to use a terminal prompt, use:
``` sh
./flite_du_kin_tts -t "Muraho Rwanda" kinyarwanda.wav
```