xVASynth's xVAPitch (v3) type of voice models based on NVIDIA HIFI NeMo datasets.
Models created by Dan Ruta, origin link:
Dataset supposed origin:
xVAPitch model referenced Papers:
- Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf
- Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155
- SDP - https://arxiv.org/pdf/2106.06103.pdf
- Spline Flow - https://arxiv.org/abs/1906.04032
Legal note: Although these datasets are licensed as CC BY 4.0, the base v3 model that these are fine-tuned from, was pre-trained on non-permissive data.
- Downloads last month
- 0