danieloneill commited on
Commit
de2f105
1 Parent(s): 1199f22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -1,3 +1,36 @@
1
  ---
2
  license: creativeml-openrail-m
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
+ language:
4
+ - en
5
+ pipeline_tag: audio-to-audio
6
+ tags:
7
+ - voice-to-voice
8
+ - ddsp-svc
9
  ---
10
+
11
+ These are *example* models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).
12
+
13
+ All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.
14
+
15
+ All models are sampled at 44.1khz
16
+
17
+ - PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper"
18
+ - Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
19
+ - V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
20
+ - Nora - Trained on Fallout 4 dialogue audio from the game character "Nora"
21
+
22
+ If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.
23
+
24
+ For realtime inference, my settings are generally as follows:
25
+
26
+ - Pitch: 10 - 15 depending on model
27
+ - Segmentation Size: 0.70
28
+ - Cross fade duration: 0.06
29
+ - Historical blocks used: 6
30
+ - f0Extractor: rmvpe
31
+ - Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery"
32
+ - K-steps: 200
33
+ - Speedup: 10
34
+ - Diffusion method: ddim or pndm, depending on model
35
+ - Encode silence: Depends on the model and preference, might be best on, might be best off.
36
+