Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Lingala Text-to-Speech
|
2 |
+
|
3 |
+
This model was trained on the OpenSLR's 71.6 hours aligned lingala bible dataset.
|
4 |
+
|
5 |
+
## Model description
|
6 |
+
|
7 |
+
A Conditional Variational Autoencoder with Adversarial Learning(VITS), which is an end-to-end approach to the text-to-speech task. To train the model, we used the espnet2 toolkit.
|
8 |
+
|
9 |
+
|
10 |
+
## Usage
|
11 |
+
|
12 |
+
First install espnet2
|
13 |
+
``` sh
|
14 |
+
pip install espnet
|
15 |
+
```
|
16 |
+
Download the model and the config files from this repo.
|
17 |
+
To generate a wav file using this model, run the following:
|
18 |
+
``` sh
|
19 |
+
from espnet2.bin.tts_inference import Text2Speech
|
20 |
+
import soundfile as sf
|
21 |
+
|
22 |
+
text2speech = Text2Speech(train_config="config.yaml",model_file="train.total_count.best.pth")
|
23 |
+
wav = text2speech("oyo kati na Ye ozwi lisiko mpe bolimbisi ya masumu")["wav"]
|
24 |
+
sf.write("outfile.wav", wav.numpy(), text2speech.fs, "PCM_16")
|
25 |
+
|
26 |
+
```
|
27 |
+
|
28 |
+
|
29 |
+
|