Spaces:
Sleeping
Sleeping
Delete README.md
Browse files
README.md
DELETED
@@ -1,204 +0,0 @@
|
|
1 |
-
# AICoverGen
|
2 |
-
An autonomous pipeline to create covers with any RVC v2 trained AI voice from YouTube videos or a local audio file. For developers who may want to add a singing functionality into their AI assistant/chatbot/vtuber, or for people who want to hear their favourite characters sing their favourite song.
|
3 |
-
|
4 |
-
Showcase: https://www.youtube.com/watch?v=2qZuE4WM7CM
|
5 |
-
|
6 |
-
Setup Guide: https://www.youtube.com/watch?v=pdlhk4vVHQk
|
7 |
-
|
8 |
-
![](images/webui_generate.png?raw=true)
|
9 |
-
|
10 |
-
WebUI is under constant development and testing, but you can try it out right now on both local and colab!
|
11 |
-
|
12 |
-
## Changelog
|
13 |
-
|
14 |
-
- WebUI for easier conversions and downloading of voice models
|
15 |
-
- Support for cover generations from a local audio file
|
16 |
-
- Option to keep intermediate files generated. e.g. Isolated vocals/instrumentals
|
17 |
-
- Download suggested public voice models from table with search/tag filters
|
18 |
-
- Support for Pixeldrain download links for voice models
|
19 |
-
- Implement new rmvpe pitch extraction technique for faster and higher quality vocal conversions
|
20 |
-
- Volume control for AI main vocals, backup vocals and instrumentals
|
21 |
-
- Index Rate for Voice conversion
|
22 |
-
- Reverb Control for AI main vocals
|
23 |
-
- Local network sharing option for webui
|
24 |
-
- Extra RVC options - filter_radius, rms_mix_rate, protect
|
25 |
-
- Local file upload via file browser option
|
26 |
-
- Upload of locally trained RVC v2 models via WebUI
|
27 |
-
- Pitch detection method control, e.g. rmvpe/mangio-crepe
|
28 |
-
- Pitch change for vocals and instrumentals together. Same effect as changing key of song in Karaoke.
|
29 |
-
- Audio output format option: wav or mp3.
|
30 |
-
|
31 |
-
## Update AICoverGen to latest version
|
32 |
-
|
33 |
-
Install and pull any new requirements and changes by opening a command line window in the `AICoverGen` directory and running the following commands.
|
34 |
-
|
35 |
-
```
|
36 |
-
pip install -r requirements.txt
|
37 |
-
git pull
|
38 |
-
```
|
39 |
-
|
40 |
-
For colab users, simply click `Runtime` in the top navigation bar of the colab notebook and `Disconnect and delete runtime` in the dropdown menu.
|
41 |
-
Then follow the instructions in the notebook to run the webui.
|
42 |
-
|
43 |
-
## Colab notebook
|
44 |
-
|
45 |
-
For those without a powerful enough NVIDIA GPU, you may try AICoverGen out using Google Colab.
|
46 |
-
|
47 |
-
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/SociallyIneptWeeb/AICoverGen/blob/main/AICoverGen_colab.ipynb)
|
48 |
-
|
49 |
-
For those who want to run this locally, follow the setup guide below.
|
50 |
-
|
51 |
-
## Setup
|
52 |
-
|
53 |
-
### Install Git and Python
|
54 |
-
|
55 |
-
Follow the instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to install Git on your computer. Also follow this [guide](https://realpython.com/installing-python/) to install Python **VERSION 3.9** if you haven't already. Using other versions of Python may result in dependency conflicts.
|
56 |
-
|
57 |
-
### Install ffmpeg
|
58 |
-
|
59 |
-
Follow the instructions [here](https://www.hostinger.com/tutorials/how-to-install-ffmpeg) to install ffmpeg on your computer.
|
60 |
-
|
61 |
-
### Install sox
|
62 |
-
|
63 |
-
Follow the instructions [here](https://www.tutorialexample.com/a-step-guide-to-install-sox-sound-exchange-on-windows-10-python-tutorial/) to install sox and add it to your Windows path environment.
|
64 |
-
|
65 |
-
### Clone AICoverGen repository
|
66 |
-
|
67 |
-
Open a command line window and run these commands to clone this entire repository and install the additional dependencies required.
|
68 |
-
|
69 |
-
```
|
70 |
-
git clone https://github.com/SociallyIneptWeeb/AICoverGen
|
71 |
-
cd AICoverGen
|
72 |
-
pip install -r requirements.txt
|
73 |
-
```
|
74 |
-
|
75 |
-
### Download required models
|
76 |
-
|
77 |
-
Run the following command to download the required MDXNET vocal separation models and hubert base model.
|
78 |
-
|
79 |
-
```
|
80 |
-
python src/download_models.py
|
81 |
-
```
|
82 |
-
|
83 |
-
|
84 |
-
## Usage with WebUI
|
85 |
-
|
86 |
-
To run the AICoverGen WebUI, run the following command.
|
87 |
-
|
88 |
-
```
|
89 |
-
python src/webui.py
|
90 |
-
```
|
91 |
-
|
92 |
-
| Flag | Description |
|
93 |
-
|--------------------------------------------|-------------|
|
94 |
-
| `-h`, `--help` | Show this help message and exit. |
|
95 |
-
| `--share` | Create a public URL. This is useful for running the web UI on Google Colab. |
|
96 |
-
| `--listen` | Make the web UI reachable from your local network. |
|
97 |
-
| `--listen-host LISTEN_HOST` | The hostname that the server will use. |
|
98 |
-
| `--listen-port LISTEN_PORT` | The listening port that the server will use. |
|
99 |
-
|
100 |
-
Once the following output message `Running on local URL: http://127.0.0.1:7860` appears, you can click on the link to open a tab with the WebUI.
|
101 |
-
|
102 |
-
### Download RVC models via WebUI
|
103 |
-
|
104 |
-
![](images/webui_dl_model.png?raw=true)
|
105 |
-
|
106 |
-
Navigate to the `Download model` tab, and paste the download link to the RVC model and give it a unique name.
|
107 |
-
You may search the [AI Hub Discord](https://discord.gg/aihub) where already trained voice models are available for download. You may refer to the examples for how the download link should look like.
|
108 |
-
The downloaded zip file should contain the .pth model file and an optional .index file.
|
109 |
-
|
110 |
-
Once the 2 input fields are filled in, simply click `Download`! Once the output message says `[NAME] Model successfully downloaded!`, you should be able to use it in the `Generate` tab after clicking the refresh models button!
|
111 |
-
|
112 |
-
### Upload RVC models via WebUI
|
113 |
-
|
114 |
-
![](images/webui_upload_model.png?raw=true)
|
115 |
-
|
116 |
-
For people who have trained RVC v2 models locally and would like to use them for AI Cover generations.
|
117 |
-
Navigate to the `Upload model` tab, and follow the instructions.
|
118 |
-
Once the output message says `[NAME] Model successfully uploaded!`, you should be able to use it in the `Generate` tab after clicking the refresh models button!
|
119 |
-
|
120 |
-
|
121 |
-
### Running the pipeline via WebUI
|
122 |
-
|
123 |
-
![](images/webui_generate.png?raw=true)
|
124 |
-
|
125 |
-
- From the Voice Models dropdown menu, select the voice model to use. Click `Update` if you added the files manually to the [rvc_models](rvc_models) directory to refresh the list.
|
126 |
-
- In the song input field, copy and paste the link to any song on YouTube or the full path to a local audio file.
|
127 |
-
- Pitch should be set to either -12, 0, or 12 depending on the original vocals and the RVC AI modal. This ensures the voice is not *out of tune*.
|
128 |
-
- Other advanced options for Voice conversion and audio mixing can be viewed by clicking the accordion arrow to expand.
|
129 |
-
|
130 |
-
Once all Main Options are filled in, click `Generate` and the AI generated cover should appear in a less than a few minutes depending on your GPU.
|
131 |
-
|
132 |
-
## Usage with CLI
|
133 |
-
|
134 |
-
### Manual Download of RVC models
|
135 |
-
|
136 |
-
Unzip (if needed) and transfer the `.pth` and `.index` files to a new folder in the [rvc_models](rvc_models) directory. Each folder should only contain one `.pth` and one `.index` file.
|
137 |
-
|
138 |
-
The directory structure should look something like this:
|
139 |
-
```
|
140 |
-
├── rvc_models
|
141 |
-
│ ├── John
|
142 |
-
│ │ ├── JohnV2.pth
|
143 |
-
│ │ └── added_IVF2237_Flat_nprobe_1_v2.index
|
144 |
-
│ ├── May
|
145 |
-
│ │ ├── May.pth
|
146 |
-
│ │ └── added_IVF2237_Flat_nprobe_1_v2.index
|
147 |
-
│ ├── MODELS.txt
|
148 |
-
│ └── hubert_base.pt
|
149 |
-
├── mdxnet_models
|
150 |
-
├── song_output
|
151 |
-
└── src
|
152 |
-
```
|
153 |
-
|
154 |
-
### Running the pipeline
|
155 |
-
|
156 |
-
To run the AI cover generation pipeline using the command line, run the following command.
|
157 |
-
|
158 |
-
```
|
159 |
-
python src/main.py [-h] -i SONG_INPUT -dir RVC_DIRNAME -p PITCH_CHANGE [-k | --keep-files | --no-keep-files] [-ir INDEX_RATE] [-fr FILTER_RADIUS] [-rms RMS_MIX_RATE] [-palgo PITCH_DETECTION_ALGO] [-hop CREPE_HOP_LENGTH] [-pro PROTECT] [-mv MAIN_VOL] [-bv BACKUP_VOL] [-iv INST_VOL] [-pall PITCH_CHANGE_ALL] [-rsize REVERB_SIZE] [-rwet REVERB_WETNESS] [-rdry REVERB_DRYNESS] [-rdamp REVERB_DAMPING] [-oformat OUTPUT_FORMAT]
|
160 |
-
```
|
161 |
-
|
162 |
-
| Flag | Description |
|
163 |
-
|--------------------------------------------|-------------|
|
164 |
-
| `-h`, `--help` | Show this help message and exit. |
|
165 |
-
| `-i SONG_INPUT` | Link to a song on YouTube or path to a local audio file. Should be enclosed in double quotes for Windows and single quotes for Unix-like systems. |
|
166 |
-
| `-dir MODEL_DIR_NAME` | Name of folder in [rvc_models](rvc_models) directory containing your `.pth` and `.index` files for a specific voice. |
|
167 |
-
| `-p PITCH_CHANGE` | Change pitch of AI vocals in octaves. Set to 0 for no change. Generally, use 1 for male to female conversions and -1 for vice-versa. |
|
168 |
-
| `-k` | Optional. Can be added to keep all intermediate audio files generated. e.g. Isolated AI vocals/instrumentals. Leave out to save space. |
|
169 |
-
| `-ir INDEX_RATE` | Optional. Default 0.5. Control how much of the AI's accent to leave in the vocals. 0 <= INDEX_RATE <= 1. |
|
170 |
-
| `-fr FILTER_RADIUS` | Optional. Default 3. If >=3: apply median filtering median filtering to the harvested pitch results. 0 <= FILTER_RADIUS <= 7. |
|
171 |
-
| `-rms RMS_MIX_RATE` | Optional. Default 0.25. Control how much to use the original vocal's loudness (0) or a fixed loudness (1). 0 <= RMS_MIX_RATE <= 1. |
|
172 |
-
| `-palgo PITCH_DETECTION_ALGO` | Optional. Default rmvpe. Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals). |
|
173 |
-
| `-hop CREPE_HOP_LENGTH` | Optional. Default 128. Controls how often it checks for pitch changes in milliseconds when using mangio-crepe algo specifically. Lower values leads to longer conversions and higher risk of voice cracks, but better pitch accuracy. |
|
174 |
-
| `-pro PROTECT` | Optional. Default 0.33. Control how much of the original vocals' breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable. 0 <= PROTECT <= 0.5. |
|
175 |
-
| `-mv MAIN_VOCALS_VOLUME_CHANGE` | Optional. Default 0. Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels. |
|
176 |
-
| `-bv BACKUP_VOCALS_VOLUME_CHANGE` | Optional. Default 0. Control volume of backup AI vocals. |
|
177 |
-
| `-iv INSTRUMENTAL_VOLUME_CHANGE` | Optional. Default 0. Control volume of the background music/instrumentals. |
|
178 |
-
| `-pall PITCH_CHANGE_ALL` | Optional. Default 0. Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly. |
|
179 |
-
| `-rsize REVERB_SIZE` | Optional. Default 0.15. The larger the room, the longer the reverb time. 0 <= REVERB_SIZE <= 1. |
|
180 |
-
| `-rwet REVERB_WETNESS` | Optional. Default 0.2. Level of AI vocals with reverb. 0 <= REVERB_WETNESS <= 1. |
|
181 |
-
| `-rdry REVERB_DRYNESS` | Optional. Default 0.8. Level of AI vocals without reverb. 0 <= REVERB_DRYNESS <= 1. |
|
182 |
-
| `-rdamp REVERB_DAMPING` | Optional. Default 0.7. Absorption of high frequencies in the reverb. 0 <= REVERB_DAMPING <= 1. |
|
183 |
-
| `-oformat OUTPUT_FORMAT` | Optional. Default mp3. wav for best quality and large file size, mp3 for decent quality and small file size. |
|
184 |
-
|
185 |
-
|
186 |
-
## Terms of Use
|
187 |
-
|
188 |
-
The use of the converted voice for the following purposes is prohibited.
|
189 |
-
|
190 |
-
* Criticizing or attacking individuals.
|
191 |
-
|
192 |
-
* Advocating for or opposing specific political positions, religions, or ideologies.
|
193 |
-
|
194 |
-
* Publicly displaying strongly stimulating expressions without proper zoning.
|
195 |
-
|
196 |
-
* Selling of voice models and generated voice clips.
|
197 |
-
|
198 |
-
* Impersonation of the original owner of the voice with malicious intentions to harm/hurt others.
|
199 |
-
|
200 |
-
* Fraudulent purposes that lead to identity theft or fraudulent phone calls.
|
201 |
-
|
202 |
-
## Disclaimer
|
203 |
-
|
204 |
-
I am not liable for any direct, indirect, consequential, incidental, or special damages arising out of or in any way connected with the use/misuse or inability to use this software.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|