kevinwang676 commited on
Commit
38d07ec
β€’
1 Parent(s): fb894d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -90
README.md CHANGED
@@ -1,90 +1,13 @@
1
- # MB-iSTFT-VITS2
2
-
3
- ![Alt text](resources/image6.png)
4
-
5
- A... [vits2_pytorch](https://github.com/p0p4k/vits2_pytorch) and [MB-iSTFT-VITS](https://github.com/MasayaKawamura/MB-iSTFT-VITS) hybrid... Gods, an abomination! Who created this atrocity?
6
-
7
- This is an experimental build. Does not guarantee performance, therefore.
8
-
9
- According to [shigabeev](https://github.com/shigabeev)'s [experiment](https://github.com/FENRlR/MB-iSTFT-VITS2/issues/2), it can now dare claim the word SOTA for its performance (at least for Russian).
10
-
11
-
12
- ## pre-requisites
13
- 1. Python >= 3.8
14
- 2. CUDA
15
- 3. [Pytorch](https://pytorch.org/get-started/previous-versions/#v1131) version 1.13.1 (+cu117)
16
- 4. Clone this repository
17
- 5. Install python requirements.
18
- ```
19
- pip install -r requirements.txt
20
- ```
21
-
22
- ~~1. You may need to install espeak first: `apt-get install espeak`~~
23
-
24
- If you want to proceed with those cleaned texts in [filelists](filelists), you need to install espeak.
25
- ```
26
- apt-get install espeak
27
- ```
28
- 7. Prepare datasets & configuration
29
-
30
- ~~1. ex) Download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: `ln -s /path/to/LJSpeech-1.1/wavs DUMMY1`~~
31
- 1. wav files (22050Hz Mono, PCM-16)
32
- 2. Prepare text files. One for training<sup>[(ex)](filelists/ljs_audio_text_train_filelist.txt)</sup> and one for validation<sup>[(ex)](filelists/ljs_audio_text_val_filelist.txt)</sup>.
33
-
34
- - Single speaker<sup>[(ex)](filelists/ljs_audio_text_test_filelist.txt)</sup>
35
-
36
- ```
37
- wavfile_path|transcript
38
- ```
39
-
40
-
41
- - Multi speaker<sup>[(ex)](filelists/vctk_audio_sid_text_test_filelist.txt)</sup>
42
-
43
- ```
44
- wavfile_path|speaker_id|transcript
45
- ```
46
- 4. Run preprocessing with a [cleaner](text/cleaners.py) of your interest. You may change the [symbols](text/symbols.py) as well.
47
- - Single speaker
48
- ```
49
- python preprocess.py --text_index 1 --filelists PATH_TO_train.txt --text_cleaners CLEANER_NAME
50
- python preprocess.py --text_index 1 --filelists PATH_TO_val.txt --text_cleaners CLEANER_NAME
51
- ```
52
-
53
- - Multi speaker
54
- ```
55
- python preprocess.py --text_index 2 --filelists PATH_TO_train.txt --text_cleaners CLEANER_NAME
56
- python preprocess.py --text_index 2 --filelists PATH_TO_val.txt --text_cleaners CLEANER_NAME
57
- ```
58
- The resulting cleaned text would be like [this(single)](filelists/ljs_audio_text_test_filelist.txt.cleaned). <sup>[ex - multi](filelists/vctk_audio_sid_text_test_filelist.txt.cleaned)</sup>
59
-
60
- 9. Build Monotonic Alignment Search.
61
- ```sh
62
- # Cython-version Monotonoic Alignment Search
63
- cd monotonic_align
64
- mkdir monotonic_align
65
- python setup.py build_ext --inplace
66
- ```
67
- 8. Edit [configurations](configs) based on files and cleaners you used.
68
-
69
- ## Setting json file in [configs](configs)
70
- | Model | How to set up json file in [configs](configs) | Sample of json file configuration|
71
- | :---: | :---: | :---: |
72
- | iSTFT-VITS2 | ```"istft_vits": true, ```<br>``` "upsample_rates": [8,8], ``` | istft_vits2_base.json |
73
- | MB-iSTFT-VITS2 | ```"subbands": 4,```<br>```"mb_istft_vits": true, ```<br>``` "upsample_rates": [4,4], ``` | mb_istft_vits2_base.json |
74
- | MS-iSTFT-VITS2 | ```"subbands": 4,```<br>```"ms_istft_vits": true, ```<br>``` "upsample_rates": [4,4], ``` | ms_istft_vits2_base.json |
75
- | Mini-iSTFT-VITS2 | ```"istft_vits": true, ```<br>``` "upsample_rates": [8,8], ```<br>```"hidden_channels": 96, ```<br>```"n_layers": 3,``` | mini_istft_vits2_base.json |
76
- | Mini-MB-iSTFT-VITS2 | ```"subbands": 4,```<br>```"mb_istft_vits": true, ```<br>``` "upsample_rates": [4,4], ```<br>```"hidden_channels": 96, ```<br>```"n_layers": 3,```<br>```"upsample_initial_channel": 256,``` | mini_mb_istft_vits2_base.json |
77
-
78
- ## Training Example
79
- ```sh
80
- # train_ms.py for multi speaker
81
- python train.py -c configs/mb_istft_vits2_base.json -m models/test
82
- ```
83
-
84
- ## Credits
85
- - [jaywalnut310/vits](https://github.com/jaywalnut310/vits)
86
- - [p0p4k/vits2_pytorch](https://github.com/p0p4k/vits2_pytorch)
87
- - [MasayaKawamura/MB-iSTFT-VITS](https://github.com/MasayaKawamura/MB-iSTFT-VITS)
88
- - [ORI-Muchim/PolyLangVITS](https://github.com/ORI-Muchim/PolyLangVITS)
89
- - [tonnetonne814/MB-iSTFT-VITS-44100-Ja](https://github.com/tonnetonne814/MB-iSTFT-VITS-44100-Ja)
90
- - [misakiudon/MB-iSTFT-VITS-multilingual](https://github.com/misakiudon/MB-iSTFT-VITS-multilingual)
 
1
+ ---
2
+ title: VITS2 Chinese
3
+ emoji: 🌊
4
+ colorFrom: blue
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 3.36.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference