danieloneill
/

ddsp-svc-samplemodels-en

Model card Files Files and versions Community

danieloneill commited on Nov 12, 2023

Commit

de2f105

•

1 Parent(s): 1199f22

Update README.md

Files changed (1) hide show

README.md +33 -0

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
 ---
 license: creativeml-openrail-m
 ---

 ---
 license: creativeml-openrail-m
+language:
+- en
+pipeline_tag: audio-to-audio
+tags:
+- voice-to-voice
+- ddsp-svc
 ---
+These are *example* models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
+All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
+All models are sampled at 44.1khz
+- PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper"
+- Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
+- V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
+- Nora - Trained on Fallout 4 dialogue audio from the game character "Nora"
+If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
+For realtime inference, my settings are generally as follows:
+- Pitch: 10 - 15 depending on model
+- Segmentation Size: 0.70
+- Cross fade duration: 0.06
+- Historical blocks used: 6
+- f0Extractor: rmvpe
+- Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery"
+- K-steps: 200
+- Speedup: 10
+- Diffusion method: ddim or pndm, depending on model
+- Encode silence: Depends on the model and preference, might be best on, might be best off.