File size: 4,931 Bytes
d5ee97c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# C++ Inference using TFlite
TensorFlow Lite is an open source deep learning framework for on-device inference. On Android and Linux (including Raspberry Pi) platforms, we can run inferences using TensorFlow Lite APIs available in C++. The repository TensorFlowTTS and TensorFlow Lite help developers run popular text-to-speech (TTS) models on mobile, embedded, and IoT devices.

## TFlite model convert method
Method see [colab notebook](https://colab.research.google.com/drive/1Ma3MIcSdLsOxqOKcN1MlElncYMhrOg3J?usp=sharing#scrollTo=KCm6Oj7iLlu5).

**Notes:**
- Quantization will deteriorate vocoder and bring noise, so the vocoder doesn't do optimization.
- TensorFlow Lite in C++ doesn't support the TensorFlow operation of Dropout. So the inference function need delete Dropout before converting tflite model, and it doesn't affect the inference result.
For example, fastspeech2 models:
```python
# tensorflow_tts/models/fastspeech2.py
# ...
    def _inference():
        # ...
        # f0_embedding = self.f0_dropout(
        #     self.f0_embeddings(tf.expand_dims(f0_outputs, 2)), training=True
        # )
        # energy_embedding = self.energy_dropout(
        #     self.energy_embeddings(tf.expand_dims(energy_outputs, 2)), training=True
        # )

        f0_embedding = self.f0_embeddings(tf.expand_dims(f0_outputs, 2))

        energy_embedding = self.energy_embeddings(tf.expand_dims(energy_outputs, 2))
        # ...
```

## About Code
- TfliteBase.cpp: A base class for loading tflite-model and creating tflite interpreter. By inheriting from this class, you can implement specific behavior, like Mel-spectrogram and Vocoder.
- TTSFrontend.cpp: Text preprocessor converts string to ID based on your desiged phoneme2ID dict, which needs a text to pronunciation module, like g2p for English and pinyin for Chinese.
- TTSBackend.cpp: It contains two-step process - first generating a Mel-spectrogram from phoneme-ID sequence and then generating the audio waveform by Vocoder.


## Using the demo
A demo of English or Mandarin TTS and the [tflite-models](https://github.com/lr2582858/TTS_tflite_cpp/releases/tag/0.1.0) are available for linux platform. The pretrained models to be converted are download from the colab notebook ([English](https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing#scrollTo=4uv_QngUmFbK) or [Mandarin](https://colab.research.google.com/drive/1Ma3MIcSdLsOxqOKcN1MlElncYMhrOg3J?usp=sharing#scrollTo=KCm6Oj7iLlu5)). Mel-generator and Vocoder select FastSpeech2 and Multiband-MelGAN, respectively.

**Notes:**
The text2ids function in TTSFrontend.cpp is implemented by using bash command in C++ instead of developing a new pronunciation module (see /demo/text2ids.py). In fact, it is not a recommended method, and you should redevelop a appropriate text2ids module, like the code in examples/cppwin.

**Firstly**, it should compile a Tensorflow Lite static library. The method see the [reference](https://www.tensorflow.org/lite/guide/build_rpi) from the official guidance of Tensorflow.

Execute the following command to compile a static library for linux:
```shell
./tensorflow/lite/tools/make/download_dependencies.sh
./tensorflow/lite/tools/make/build_lib.sh (for linux)
```
(The official also provides different complie methods for other platforms (such as rpi, aarch64, and riscv), see /tensorflow/lite/tools/make/)

Because this process takes much time, so a static library builded for linux is also available ([libtensorflow-lite.a](https://github.com/lr2582858/TTS_tflite_cpp/releases/tag/0.1.0)).

The structure of the demo folder should be:
```
|- [cpptflite]/
|      |- demo/
|      |- src/
|      |- lib/
|          |- flatbuffers/
|          |- tensorflow/lite/
|          |- libtensorflow-lite.a
```
The two folders of flatbuffers/ and tensorflow/lite/ provide the required header files.

**Then**,
```shell
cd examples/cpptflite
mkdir build
cd build
```

**English Demo (using LJSPEECH dataset)**
```shell
cmake .. -DMAPPER=LJSPEECH
make

./demo "Bill got in the habit of asking himself “Is that thought true?”" test.wav
```

**or Mandarin Demo (using Baker dataset)**
```shell
cmake .. -DMAPPER=BAKER
make

./demo "这是一个开源的端到端中文语音合成系统" test.wav
```



## Results
- #### Comparison before and after conversion (English TTS)
  ```
  "Bill got in the habit of asking himself “Is that thought true?” \ 
  And if he wasn’t absolutely certain it was, he just let it go."
  ```
- Before conversion (Python)

  ![ori_mel](./results/lj_ori_mel.png)


- After conversion (C++)

  ![tflite_mel](./results/lj_tflite_mel.png)

- #### Adding #3 in chinese text will create pause prosody in audio
```
这是一个开源的端到端中文语音合成系统"
```
![tflite_mel](./results/tflite_mel.png)
```
"这是一个开源的#3端到端#3中文语音合成系统"
```
![tflite_mel](./results/tflite_mel2.png)