Spaces:
No application file
No application file
<div align="center"> | |
<img alt="LOGO" src="https://cdn.jsdelivr.net/gh/fishaudio/fish-diffusion@main/images/logo_512x512.png" width="256" height="256" /> | |
# Fish Diffusion | |
<div> | |
<a href="https://github.com/fishaudio/fish-diffusion/actions/workflows/ci.yml"> | |
<img alt="Build Status" src="https://img.shields.io/github/actions/workflow/status/fishaudio/fish-diffusion/ci.yml?style=flat-square&logo=GitHub"> | |
</a> | |
<a href="https://hub.docker.com/r/lengyue233/fish-diffusion"> | |
<img alt="Docker Hub" src="https://img.shields.io/docker/cloud/build/lengyue233/fish-diffusion?style=flat-square&logo=Docker&logoColor=white"> | |
</a> | |
<a href="https://discord.gg/wbYSRBrW2E"> | |
<img alt="Discord" src="https://img.shields.io/discord/1044927142900809739?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"> | |
</a> | |
</div> | |
</div> | |
------ | |
An easy to understand TTS / SVS / SVC training framework. | |
> Check our [Wiki](https://fishaudio.github.io/fish-diffusion/) to get started! | |
[δΈζζζ‘£](README.md) | |
## Summary | |
Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows: | |
+ Support multi-speaker | |
+ The code structure of this repository is simpler and easier to understand, and all modules are decoupled | |
+ Support [441khz Diff Singer community vocoder](https://openvpi.github.io/vocoders/) | |
+ Support multi-machine multi-devices training, support half-precision training, save your training speed and memory | |
## Preparing the environment | |
The following commands need to be executed in the conda environment of python 3.10 | |
```bash | |
# Install PyTorch related core dependencies, skip if installed | |
# Reference: https://pytorch.org/get-started/locally/ | |
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia | |
# Install Poetry dependency management tool, skip if installed | |
# Reference: https://python-poetry.org/docs/#installation | |
curl -sSL https://install.python-poetry.org | python3 - | |
# Install the project dependencies | |
poetry install | |
``` | |
## Vocoder preparation | |
Fish Diffusion requires the [OPENVPI 441khz NSF-HiFiGAN](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1) vocoder to generate audio. | |
### Automatic download | |
```bash | |
python tools/download_nsf_hifigan.py | |
``` | |
If you are using the script to download the model, you can use the `--agree-license` parameter to agree to the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. | |
```bash | |
python tools/download_nsf_hifigan.py --agree-license | |
``` | |
### Manual download | |
Download and unzip `nsf_hifigan_20221211.zip` from [441khz vocoder](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1) | |
Copy the `nsf_hifigan` folder to the `checkpoints` directory (create if not exist) | |
## Dataset preparation | |
You only need to put the dataset into the `dataset` directory in the following file structure | |
```shell | |
dataset | |
ββββtrain | |
β ββββxxx1-xxx1.wav | |
β ββββ... | |
β ββββLxx-0xx8.wav | |
β ββββspeaker0 (Subdirectory is also supported) | |
β ββββxxx1-xxx1.wav | |
ββββvalid | |
ββββxx2-0xxx2.wav | |
ββββ... | |
ββββxxx7-xxx007.wav | |
``` | |
```bash | |
# Extract all data features, such as pitch, text features, mel features, etc. | |
python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean | |
``` | |
## Baseline training | |
> The project is under active development, please backup your config file | |
> The project is under active development, please backup your config file | |
> The project is under active development, please backup your config file | |
```bash | |
# Single machine single card / multi-card training | |
python train.py --config configs/svc_hubert_soft.py | |
# Resume training | |
python train.py --config configs/svc_hubert_soft.py --resume [checkpoint] | |
# Fine-tune the pre-trained model | |
# Note: You should adjust the learning rate scheduler in the config file to warmup_cosine_finetune | |
python train.py --config configs/svc_hubert_soft.py --pretrained [checkpoint] | |
``` | |
## Inference | |
```bash | |
# Inference using shell, you can use --help to view more parameters | |
python inference.py --config [config] \ | |
--checkpoint [checkpoint] \ | |
--input [input audio] \ | |
--output [output audio] | |
# Gradio Web Inference, other parameters will be used as gradio default parameters | |
python inference/gradio_inference.py --config [config] \ | |
--checkpoint [checkpoint] \ | |
--gradio | |
``` | |
## Convert a DiffSVC model to Fish Diffusion | |
```bash | |
python tools/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \ | |
--input-path [DiffSVC ckpt] \ | |
--output-path [Fish Diffusion ckpt] | |
``` | |
## Contributing | |
If you have any questions, please submit an issue or pull request. | |
You should run `tools/lint.sh` before submitting a pull request. | |
Real-time documentation can be generated by | |
```bash | |
sphinx-autobuild docs docs/_build/html | |
``` | |
## Credits | |
+ [diff-svc original](https://github.com/prophesier/diff-svc) | |
+ [diff-svc optimized](https://github.com/innnky/diff-svc/) | |
+ [DiffSinger](https://github.com/openvpi/DiffSinger/) | |
+ [SpeechSplit](https://github.com/auspicious3000/SpeechSplit) | |
## Thanks to all contributors for their efforts | |
<a href="https://github.com/fishaudio/fish-diffusion/graphs/contributors" target="_blank"> | |
<img src="https://contrib.rocks/image?repo=fishaudio/fish-diffusion" /> | |
</a> | |