Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -1,56 +1,8 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
Welcome to play around with the base models!
|
11 |
-
Chinese & English & Japanese:[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer) Author: Me
|
12 |
-
|
13 |
-
Chinese & Japanese:[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai) Author: [SayaSS](https://github.com/SayaSS)
|
14 |
-
|
15 |
-
|
16 |
-
### Currently Supported Tasks:
|
17 |
-
- [x] Clone character voice from 10+ short audios
|
18 |
-
- [x] Clone character voice from long audio(s) >= 3 minutes (one audio should contain single speaker only)
|
19 |
-
- [x] Clone character voice from videos(s) >= 3 minutes (one video should contain single speaker only)
|
20 |
-
- [x] Clone character voice from BILIBILI video links (one video should contain single speaker only)
|
21 |
-
|
22 |
-
### Currently Supported Characters for TTS & VC:
|
23 |
-
- [x] Any character you wish as long as you have their voices!
|
24 |
-
(Note that voice conversion can only be conducted between any two speakers in the model)
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
## Fine-tuning
|
29 |
-
It's recommended to perform fine-tuning on [Google Colab](https://colab.research.google.com/drive/1pn1xnFfdLK63gVXDwV4zCXfVeo8c-I-0?usp=sharing)
|
30 |
-
because the original VITS has some dependencies that are difficult to configure.
|
31 |
-
|
32 |
-
### How long does it take?
|
33 |
-
1. Install dependencies (3 min)
|
34 |
-
2. Choose pretrained model to start. The detailed differences between them are described in [Colab Notebook](https://colab.research.google.com/drive/1pn1xnFfdLK63gVXDwV4zCXfVeo8c-I-0?usp=sharing)
|
35 |
-
3. Upload the voice samples of the characters you wish to add,see [DATA.MD](https://github.com/Plachtaa/VITS-fast-fine-tuning/blob/main/DATA_EN.MD) for detailed uploading options.
|
36 |
-
4. Start fine-tuning. Time taken varies from 20 minutes ~ 2 hours, depending on the number of voices you uploaded.
|
37 |
-
|
38 |
-
|
39 |
-
## Inference or Usage (Currently support Windows only)
|
40 |
-
0. Remember to download your fine-tuned model!
|
41 |
-
1. Download the latest release
|
42 |
-
2. Put your model & config file into the folder `inference`, which are named `G_latest.pth` and `finetune_speaker.json`, respectively.
|
43 |
-
3. The file structure should be as follows:
|
44 |
-
```
|
45 |
-
inference
|
46 |
-
├───inference.exe
|
47 |
-
├───...
|
48 |
-
├───finetune_speaker.json
|
49 |
-
└───G_latest.pth
|
50 |
-
```
|
51 |
-
4. run `inference.exe`, the browser should pop up automatically.
|
52 |
-
|
53 |
-
## Use in MoeGoe
|
54 |
-
0. Prepare downloaded model & config file, which are named `G_latest.pth` and `moegoe_config.json`, respectively.
|
55 |
-
1. Follow [MoeGoe](https://github.com/CjangCjengh/MoeGoe) page instructions to install, configure path, and use.
|
56 |
-
|
|
|
1 |
+
title: zhenhuan
|
2 |
+
emoji: 🚀
|
3 |
+
colorFrom: green
|
4 |
+
colorTo: gray
|
5 |
+
sdk: gradio
|
6 |
+
sdk_version: 3.7
|
7 |
+
app_file: app.py
|
8 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|