Update README.md

#1
by tiansz - opened

The original code has a small problem, I made a slight change, run the notebook as follows:https://www.kaggle.com/code/tiansztianszs/facebook-tts-transformer-zh-cv7-css10/notebook

Hi dear, very glad to see your post, that can work, thank you very much. But I have a new question, wish get your support. How I can save the wav and rate to a wave or mp3 file? I search a method by "torchaudio.save(filepath="test.wav", src=wav, sample_rate=rate)", but I got an error "RuntimeError: Input tensor has to be 2D." I am not very familiar with this one. Could you help me? Thank you . Wish you have a great day.

Hello, please add the following code after the example I gave:

import torchaudio

# 将一维张量转换为二维张量
wav = wav.unsqueeze(0)

# 保存为wav文件
torchaudio.save('audio_save.wav', wav, rate)

# 保存为mp3文件
torchaudio.save('audio_save.mp3', wav, rate, format='mp3')

Hi dear, Thank you very much. By the way, I recently meet a new problem, I don't know anyone ever face. That's if my input text is a little long, maybe several tens of Chinese characters, the output speech is not ok. At the last part it will repeat some part and miss some part.
Thank you once more. Have a great day.

Yes, the models work poorly, my recommendation is to use the following models (ranked according to the performance below):

  1. Scripted Microsoft TTS and Command Line Version
  2. Baidu Flying Oars TTS
  3. espnet TTS

是的,模型效果很差,我的建议是使用以下模型(根据性能排名):

  1. 脚本版TTS and 命令行版微软TTS
  2. 百度飞桨TTS
  3. espnet TTS
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment