Spaces:
Running
question about bad quality of cloning voice.
I use a 10-second good-quality human voice .wav format and transfer it to .npz format on this website. But I really get a bad voice after using this prompt to produce the voice on bark model.
The quality of results depends on a lot of factors.
- Make sure there's no background noise
- Make sure your text prompt when generating isn't something bark struggles with
- Make sure your voice is speaking English (as this model doesn't support other languages, there is a polish model though)
it's probably not the first one if you have a good quality clip, so, what language is your clip in? and what is your text prompt? and what are you using to run bark?
I have the same issue, the data I used was generated from elevenLabs as .mp3 and I converted it using adobe encoder to wav format on default wav preset, the data is crystal clear with no background noise speaking in perfect english. but the generated .npz file on inference is crackling with strange noices like in a badly tuned radio.
it might have still detected some background noise somehow, or associated the voice with background noise. Bark is based on GPT and can make a lot of decisions without being prompted to do so, in some cases it could generate music, or change the voice drastically.