Tips for accurate Spanish speaker cloning?

by manugarri - opened Sep 16, 2023

Sep 16, 2023

Im having trouble getting to replicate my voice in an accurate manner. I tried creating a 3 second wav file , then running the following block:

                file_path="output.wav",
                speaker_wav="/path/to/target/speaker.wav",
                language="en")

The resulting output audio does not sound like me at all.

I tried increasing the decoder iterations, and also trying longer recordings.

Are there any guidelines on how to produce the speaker audio to improve the output quality?

unificador

Sep 18, 2023

•

edited Sep 18, 2023

Just try to change the language value from en to es

pailletjp

Sep 18, 2023

lol

manugarri

Sep 18, 2023

@unificador im sorry, the sample i used of course i changed the language to 'es' :D . its just the snippet i wrote here i copy pasted from the landing page.

Any real tips anyone?

gorkemgoknar

Coqui.ai org Sep 19, 2023

•

edited Oct 4, 2023

For better cloning:

6 seconds or more of audio refernce
No background noise/mic bumps etc
Cleaner audio file
No big silences on reference audio (at start and end especially)
Note: there is a pretty fast and working filter for microphone especially on this space, check its app.py https://huggingface.co/spaces/coqui/xtts

gorkemgoknar

Coqui.ai org Oct 27, 2023

You can now fine tune using XTTS (TTS v0.19.0 )
https://tts.readthedocs.io/en/dev/models/xtts.html

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment