About

This is a basic zero-shot voice conversion model trained with VITS + contentvec

See:

https://github.com/alphacep/vosk-tts/tree/master/vc

https://github.com/quickvc/QuickVC-VoiceConversion

https://github.com/auspicious3000/contentvec

Speaker Similarity

Computed with eval.py with Resemblyzer

Original QuickVC (trained on VCTK)       Average: 0.667 Min: 0.477
New model                                Average: 0.880 Min: 0.712
Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support audio-to-audio models for transformers library.

Space using alphacep/vosk-vc-ru 1