FAcodec trained on 50k hours speech data, with more timbre diversity and better at reconstructing speakers from podcasts, videos, games or animations.
This is a separate decoder designed and trained based on the pretrained encoder specifically for voice conversion task.
It is capable of zero-shot voice conversion, stream voice conversion and has outstanding timbre generalization ability.
See main repository for example usages.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.