Safetensors
DiffRhythm2 / README.md
ASLP-lab's picture
Update README.md
9aa1574 verified
metadata
license: apache-2.0

Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching

License

Yuepeng Jiang, Huakang Chen, Ziqian Ning, Jixun Yao, Zerui Han, Di Wu, Meng Meng, Jian Luan, Zhonghua Fu, Lei Xie†

DiffRhythm 2 (Chinese: 谛韵, Dì Yùn) is the next-generation open-sourced music generation framework that advances the original DiffRhythm with a semi-autoregressive diffusion architecture. It is capable of generating full-length songs with precise lyric alignment and coherent musical structures. The name inherits the essence of DiffRhythm — “Diff” reflects its diffusion-based generative backbone, while “Rhythm” emphasizes its dedication to musicality and temporal flow. The Chinese name 谛韵 (Dì Yùn) continues this dual symbolism: “谛” (attentive listening) represents perceptual awareness, and “韵” (melodic charm) captures the expressive beauty of music.

📢 News and Updates

📋 TODOs

  • Support Colab.
  • Gradio support.
  • Song extension.
  • Instrumental music generation.
  • Release code and weights.
  • Release paper to Arxiv.

🔨 Inference

Following the steps below to clone the repository and install the environment.

# clone and enter the repositry
git clone https://github.com/ASLP-lab/DiffRhythm2.git
cd DiffRhythm2

# install the environment
## espeak-ng
# For Debian-like distribution (e.g. Ubuntu, Mint, etc.)
sudo apt-get install espeak-ng
# For RedHat-like distribution (e.g. CentOS, Fedora, etc.) 
sudo yum install espeak-ng
# For MacOS
brew install espeak-ng
# For Windows
# Please visit https://github.com/espeak-ng/espeak-ng/releases to download .msi installer

## install requirements
pip install -r requirements.txt

On Linux you can now simply use the inference script:

# For inference using a reference WAV file
bash inference.sh

Weights will be automatically downloaded from Hugging Face upon the first run.

Example files of lyrics and reference audio can be found in example.

📜 License & Disclaimer

DiffRhythm 2 (code and weights) is released under the Apache License 2.0. This open-source license allows you to freely use, modify, and distribute the model, as long as you include the appropriate copyright notice and disclaimer.

We do not make any profit from this model. Our goal is to provide a high-quality base model for music generation, fostering innovation in AI music and contributing to the advancement of human creativity. We hope that DiffRhythm 2 will serve as a foundation for further research and development in the field of AI-generated music.

DiffRhythm 2 enables the creation of original music across diverse genres, supporting applications in artistic creation, education, and entertainment. While designed for positive use cases, potential risks include unintentional copyright infringement through stylistic similarities, inappropriate blending of cultural musical elements, and misuse for generating harmful content. To ensure responsible deployment, users must implement verification mechanisms to confirm musical originality, disclose AI involvement in generated works, and obtain permissions when adapting protected styles.

Citation

@article{diffrhythm2,
  title={DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching},
  author={Jiang, Yuepeng and Chen, Huakang and Ning, Ziqian and Yao, Jixun and Han, Zerui and Wu, Di and Meng, Meng and Luan, Jian and Fu, Zhonghua and Xie, Lei},
  journal={arXiv preprint arXiv:2510.22950},
  year={2025}
}