Aki004 commited on
Commit
34bfa0f
1 Parent(s): 7652637

Upload MANUAL.md

Browse files
Files changed (1) hide show
  1. MANUAL.md +158 -0
MANUAL.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Herta Voice Changer
2
+
3
+ ## Introduction
4
+
5
+ This AI model is based on **Soft Voice Changer Vision Transformers Singing Voice Conversion (SO-VITS-SVC)**. Refer to this [Github Repository](https://github.com/svc-develop-team/so-vits-svc/tree/4.0) from the 4.0 branch. This model was inspired by [Herta](https://honkai-star-rail.fandom.com/wiki/Herta) from [Honkai Star Rail](https://hsr.hoyoverse.com/en-us/). This model can be used to convert the original voice from an audio file into this character's voice.
6
+
7
+ ## How to Prepare Audio Files
8
+
9
+ Your audio files should be `shorter than 10 seconds`, have no `BGM`, and have a sampling rate of `44100 Hz`.
10
+
11
+ 1. Create a new folder inside the `dataset_raw` folder (This folder name will be your `SpeakerID`).
12
+ 2. Put your audio files into the folder you created above.
13
+
14
+ ### Note:
15
+
16
+ 1. Your audio files should be in `.wav` format.
17
+ 2. If your audio files are longer than 10 seconds, I suggest you trim them down using your desired software or [audio slicer GUI](https://github.com/flutydeer/audio-slicer).
18
+ 3. If your audio files have **BGM**, please remove it using a program such as [Ultimate Vocal Remover](https://ultimatevocalremover.com/). The `3_HP-Vocal-UVR.pth` or `UVR-MDX-NET Main` is recommended.
19
+ 4. If your audio files have a sampling rate different from 44100 Hz, I suggest you resample them using [Audacity](https://www.audacityteam.org/) or by running `python resample.py` in your `CMD`.
20
+
21
+ ## How to Build Locally
22
+
23
+ 1. Clone the repository from the 4.0 branch: `git clone https://github.com/svc-develop-team/so-vits-svc.git`
24
+ 2. Put your `prepared audio` into the `dataset_raw` folder.
25
+ 3. Open your **Command Line** and install the `so-vits-svc` library: `%pip install -U so-vits-svc-fork`
26
+ 4. Navigate to your project directory using the **Command Line**.
27
+ 5. Run `svc pre-resample` in your prompt.
28
+ 6. After completing the step above, run `svc pre-config`.
29
+ 7. After completing the step above, run `svc pre-hubert`. **(This step may take a while.)**.
30
+ 8. After completing the step above, run `svc train -t`. **(This step will take a while based on your `GPU` and the number of `epochs` you want.)**.
31
+
32
+ ### How to Change Epoch Value Locally
33
+ The meaning of `epoch` is the number of training iterations for your model. **Example: if you set the epoch value to 10000, your model will take 10000 steps to finish** `(default epoch value is 10000)`. To change your `epoch value`:
34
+
35
+ 1. Go to your project folder.
36
+ 2. Find the folder named `config`.
37
+ 3. Inside that folder, you should see `config.json`.
38
+ 4. In `config.json`, there should be a section that looks like this:
39
+
40
+ ```json
41
+ "train": {
42
+ "log_interval": 200,
43
+ "eval_interval": 800,
44
+ "seed": 1234,
45
+ "epochs": <PUT YOUR VALUE HERE>,
46
+ "learning_rate": 0.0001,
47
+ "betas": [0.8, 0.99]
48
+ }
49
+ ```
50
+
51
+ This can be done after `svc pre-config` has already finished.
52
+
53
+
54
+ ### How to inferance in local.
55
+ To perform inference locally, navigate to the project directory, create a Python file, and copy the following lines of code:
56
+
57
+ ```python
58
+ your_audio_file = 'your_audio.wav'
59
+
60
+ audio, sr = librosa.load(your_audio_file, sr = 16000, mono = True)
61
+ raw_path = io.BytesIO()
62
+ soundfile.write(raw_path, audio, 16000, format = 'wav')
63
+ raw_path.seek(0)
64
+
65
+ model = Svc('logs/44k/your_model.pth', 'logs/44k/config.json')
66
+
67
+ out_audio, out_sr = model.infer('<YOUR SPEAKER ID>', 0, raw_path, auto_predict_f0 = True)
68
+ soundfile.write('out_audio.wav', out_audio.cpu().numpy(), 44100)
69
+ ```
70
+
71
+ The output file will be in the same directory as your input audio file with the name `your_audio_out.wav`
72
+
73
+ ## How to Build in Google Colab
74
+
75
+ Refer to [My Google Colab](https://colab.research.google.com/drive/1V91RM-2xzSqbmTIlaEzWZovca8stErk0?authuser=3#scrollTo=hhJ2MG1i1vfl) or the [Official Google Colab](https://colab.research.google.com/github/34j/so-vits-svc-fork/blob/main/notebooks/so-vits-svc-fork-4.0.ipynb) for the steps.
76
+
77
+ ### Google Drive Setup
78
+
79
+ 1. Create an empty folder (this will be your project folder).
80
+ 2. Inside the project folder, create a folder named `dataset_raw`.
81
+ 3. Create another folder inside `dataset_raw` (this folder name will be your `SpeakerID`).
82
+ 4. Upload your prepared audio files into the folder created in the previous step.
83
+
84
+ ### Google Colab Setup
85
+
86
+ 1. Mount your Google Drive:
87
+ ```python
88
+ from google.colab import drive
89
+ drive.mount('/content/drive')
90
+ ```
91
+
92
+ 2. Install dependencies:
93
+ ```python
94
+ !python -m pip install -U pip setuptools wheel
95
+ %pip install -U ipython
96
+ %pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu118
97
+ ```
98
+
99
+ 3. Install `so-vits-svc` library:
100
+ `%pip install -U so-vits-svc-fork `
101
+
102
+ 4. Resample your audio files:
103
+ `!svc pre-resample`
104
+
105
+ 5. Pre-config:
106
+ `!svc pre-config`
107
+
108
+ 6. Pre-hubert (this step may take a while):
109
+ `!svc pre-hubert`
110
+
111
+ 7. Train your model (this step will take a while based on your Google Colab GPU and the number of epochs you want):
112
+ `!svc train -t`
113
+
114
+ ### How to Change Epoch Value in Google Colab
115
+
116
+ The term "epoch" refers to the number of times you want to train your model. For example, if you set the epoch value to 10,000, your model will take 10,000 steps to complete (the default epoch value is 10,000).
117
+
118
+ To change the epoch value:
119
+
120
+ 1. Go to your project folder.
121
+ 2. Find the folder named `config`.
122
+ 3. Inside that folder, you should see `config.json`.
123
+ 4. In `config.json`, there should be a section that looks like this:
124
+
125
+ ```json
126
+ "train": {
127
+ "log_interval": 200,
128
+ "eval_interval": 800,
129
+ "seed": 1234,
130
+ "epochs": <PUT YOUR VALUE HERE>,
131
+ "learning_rate": 0.0001,
132
+ "betas": [0.8, 0.99]
133
+ }
134
+ ```
135
+
136
+ This can be done after `svc pre-config` has already finished.
137
+
138
+
139
+ ### How to Perform Inference in Google Colab
140
+
141
+ After training your model, you can use it to convert any original voice to your model voice by running the following command:
142
+
143
+ ```shell
144
+ !svc infer drive/MyDrive/your_model_name/your_audio_file.wav --model-path drive/MyDrive/your_model_name/logs/44k/your_model.pth --config-path drive/MyDrive/your_model_name/logs/44k/your_config.json
145
+ ```
146
+ The output file will be named `your_audio_file.out.wav`
147
+
148
+ ### Note:
149
+
150
+ 1. Your Google Drive must have at least 5 GB of free space. If you don't have enough space, consider registering a new Google account.
151
+ 2. Google Colab's Free Subscription is sufficient, but using the Pro version is recommended.
152
+ 3. Set your Google Colab Hardware Accelerator to `GPU`.
153
+
154
+ ## Credits
155
+
156
+ 1. [zomehwh/sovits-models](https://huggingface.co/spaces/zomehwh/sovits-models) from Hugging Face Space
157
+ 2. [svc-develop-team/so-vits-svc](https://github.com/svc-develop-team/so-vits-svc) from GitHub repository
158
+ 3. [voicepaw/so-vits-svc-fork](https://github.com/voicepaw/so-vits-svc-fork) from GitHub repository