voice style control quality issues

#7
by TP6174 - opened

Hi,

very cool work. The example demos sound very impressive e.g. Elon's voice. I have followed examples in ref [1] and was able to run things with no issues. I then attempted to experiment with adding a few reference voices e.g. David Attenborough and Morgan Freeman based on samples found here [2] and here [3] (>30s high quality audio recordings of their voices). My sample code setup below follows ref [1] and generates outputs with no issues. However, the output generated does not correspond with references for either voice. Are there additional things to consider like length of the reference recording, base speaker tts and other params? Or is it something rather silly?

Sample code:
reference_speaker = 'OpenVoice/resources/morgan_freeman_example.mp3'
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)
save_path = f'{output_dir}/output_morgan_freeman.wav'

Run the base speaker tts

text = "Hello, I am Morgan Freeman, and you are inside the Matrix."
src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)

Run the tone color converter

encode_message = "@MyShell"
tone_color_converter.convert(
audio_src_path=src_path,
src_se=source_se,
tgt_se=target_se,
output_path=save_path,
message=encode_message)

References:
[1] https://github.com/myshell-ai/OpenVoice/blob/main/demo_part1.ipynb
[2] https://en.wikipedia.org/wiki/File:Sir_David_Attenborough_BBC_Radio4_Desert_Island_Discs_29_Jan_2012_b01b8yy0.flac
[3] https://en.wikipedia.org/wiki/File:Morgan_freeman_bbc_radio4_film_programme_12_09_2008_b00dbcdn.flac

Also having this issue using high-quality audio samples that are around 3 minutes in length. The final output does not match nearly as well as the demo voices do.

What about shorter reference length? Asking this because the ones used as examples seem to be 10s at most.

Sign up or log in to comment