# Fish Diffusion

------ An easy to understand TTS / SVS / SVC training framework. > Check our [Wiki](https://fishaudio.github.io/fish-diffusion/) to get started! [中文文档](README.md) ## Summary Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows: + Support multi-speaker + The code structure of this repository is simpler and easier to understand, and all modules are decoupled + Support [441khz Diff Singer community vocoder](https://openvpi.github.io/vocoders/) + Support multi-machine multi-devices training, support half-precision training, save your training speed and memory ## Preparing the environment The following commands need to be executed in the conda environment of python 3.10 ```bash # Install PyTorch related core dependencies, skip if installed # Reference: https://pytorch.org/get-started/locally/ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia # Install Poetry dependency management tool, skip if installed # Reference: https://python-poetry.org/docs/#installation curl -sSL https://install.python-poetry.org | python3 - # Install the project dependencies poetry install ``` ## Vocoder preparation Fish Diffusion requires the [OPENVPI 441khz NSF-HiFiGAN](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1) vocoder to generate audio. ### Automatic download ```bash python tools/download_nsf_hifigan.py ``` If you are using the script to download the model, you can use the `--agree-license` parameter to agree to the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. ```bash python tools/download_nsf_hifigan.py --agree-license ``` ### Manual download Download and unzip `nsf_hifigan_20221211.zip` from [441khz vocoder](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1) Copy the `nsf_hifigan` folder to the `checkpoints` directory (create if not exist) ## Dataset preparation You only need to put the dataset into the `dataset` directory in the following file structure ```shell dataset ├───train │ ├───xxx1-xxx1.wav │ ├───... │ ├───Lxx-0xx8.wav │ └───speaker0 (Subdirectory is also supported) │ └───xxx1-xxx1.wav └───valid ├───xx2-0xxx2.wav ├───... └───xxx7-xxx007.wav ``` ```bash # Extract all data features, such as pitch, text features, mel features, etc. python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean ``` ## Baseline training > The project is under active development, please backup your config file > The project is under active development, please backup your config file > The project is under active development, please backup your config file ```bash # Single machine single card / multi-card training python train.py --config configs/svc_hubert_soft.py # Resume training python train.py --config configs/svc_hubert_soft.py --resume [checkpoint] # Fine-tune the pre-trained model # Note: You should adjust the learning rate scheduler in the config file to warmup_cosine_finetune python train.py --config configs/svc_hubert_soft.py --pretrained [checkpoint] ``` ## Inference ```bash # Inference using shell, you can use --help to view more parameters python inference.py --config [config] \ --checkpoint [checkpoint] \ --input [input audio] \ --output [output audio] # Gradio Web Inference, other parameters will be used as gradio default parameters python inference/gradio_inference.py --config [config] \ --checkpoint [checkpoint] \ --gradio ``` ## Convert a DiffSVC model to Fish Diffusion ```bash python tools/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \ --input-path [DiffSVC ckpt] \ --output-path [Fish Diffusion ckpt] ``` ## Contributing If you have any questions, please submit an issue or pull request. You should run `tools/lint.sh` before submitting a pull request. Real-time documentation can be generated by ```bash sphinx-autobuild docs docs/_build/html ``` ## Credits + [diff-svc original](https://github.com/prophesier/diff-svc) + [diff-svc optimized](https://github.com/innnky/diff-svc/) + [DiffSinger](https://github.com/openvpi/DiffSinger/) + [SpeechSplit](https://github.com/auspicious3000/SpeechSplit) ## Thanks to all contributors for their efforts