zixian commited on
Commit
93f15f6
1 Parent(s): 28fe43a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -56
README.md CHANGED
@@ -1,56 +1,8 @@
1
- [中文文档请点击这里](https://github.com/Plachtaa/VITS-fast-fine-tuning/blob/main/README_ZH.md)
2
- # VITS Fast Fine-tuning
3
- This repo will guide you to add your own character voices, or even your own voice, into existing VITS TTS model
4
- to make it able to do the following tasks in less than 1 hour:
5
-
6
- 1. Many-to-many voice conversion between any characters you added & preset characters in the model.
7
- 2. English, Japanese & Chinese Text-to-Speech synthesis with the characters you added & preset characters
8
-
9
-
10
- Welcome to play around with the base models!
11
- Chinese & English & Japanese:[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer) Author: Me
12
-
13
- Chinese & Japanese:[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai) Author: [SayaSS](https://github.com/SayaSS)
14
-
15
-
16
- ### Currently Supported Tasks:
17
- - [x] Clone character voice from 10+ short audios
18
- - [x] Clone character voice from long audio(s) >= 3 minutes (one audio should contain single speaker only)
19
- - [x] Clone character voice from videos(s) >= 3 minutes (one video should contain single speaker only)
20
- - [x] Clone character voice from BILIBILI video links (one video should contain single speaker only)
21
-
22
- ### Currently Supported Characters for TTS & VC:
23
- - [x] Any character you wish as long as you have their voices!
24
- (Note that voice conversion can only be conducted between any two speakers in the model)
25
-
26
-
27
-
28
- ## Fine-tuning
29
- It's recommended to perform fine-tuning on [Google Colab](https://colab.research.google.com/drive/1pn1xnFfdLK63gVXDwV4zCXfVeo8c-I-0?usp=sharing)
30
- because the original VITS has some dependencies that are difficult to configure.
31
-
32
- ### How long does it take?
33
- 1. Install dependencies (3 min)
34
- 2. Choose pretrained model to start. The detailed differences between them are described in [Colab Notebook](https://colab.research.google.com/drive/1pn1xnFfdLK63gVXDwV4zCXfVeo8c-I-0?usp=sharing)
35
- 3. Upload the voice samples of the characters you wish to add,see [DATA.MD](https://github.com/Plachtaa/VITS-fast-fine-tuning/blob/main/DATA_EN.MD) for detailed uploading options.
36
- 4. Start fine-tuning. Time taken varies from 20 minutes ~ 2 hours, depending on the number of voices you uploaded.
37
-
38
-
39
- ## Inference or Usage (Currently support Windows only)
40
- 0. Remember to download your fine-tuned model!
41
- 1. Download the latest release
42
- 2. Put your model & config file into the folder `inference`, which are named `G_latest.pth` and `finetune_speaker.json`, respectively.
43
- 3. The file structure should be as follows:
44
- ```
45
- inference
46
- ├───inference.exe
47
- ├───...
48
- ├───finetune_speaker.json
49
- └───G_latest.pth
50
- ```
51
- 4. run `inference.exe`, the browser should pop up automatically.
52
-
53
- ## Use in MoeGoe
54
- 0. Prepare downloaded model & config file, which are named `G_latest.pth` and `moegoe_config.json`, respectively.
55
- 1. Follow [MoeGoe](https://github.com/CjangCjengh/MoeGoe) page instructions to install, configure path, and use.
56
-
 
1
+ title: zhenhuan
2
+ emoji: 🚀
3
+ colorFrom: green
4
+ colorTo: gray
5
+ sdk: gradio
6
+ sdk_version: 3.7
7
+ app_file: app.py
8
+ pinned: false