trying to hack together a voice cloning demo....

#1
by sherlock1199 - opened

I've been trying to create my own custom embeddings using speechbrain/spkrec-xvect-voxceleb

signal, fs =torchaudio.load('morgan.wav')
embeddings = classifier.encode_batch(signal)

and generating audio using:

speech = model.generate_speech(inputs["input_ids"], embeddings[0], vocoder=vocoder)

but having the output garbled. is there an intermediary step i'm missing ?

so managed to get a non-garbled output. after resampling my wav file to 16k hz and converting it to mono. now to figure out how to improve the quality of voice reproduction.

Great work! Where can I find information about fine-tuning to other languages?

Sign up or log in to comment