Diffusers documentation

Dance Diffusion

You are viewing v0.9.0 version. A newer version v0.27.2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Dance Diffusion


Dance Diffusion by Zach Evans.

Dance Diffusion is the first in a suite of generative audio tools for producers and musicians to be released by Harmonai. For more info or to get involved in the development of these tools, please visit https://harmonai.org and fill out the form on the front page.

The original codebase of this implementation can be found here.

Available Pipelines:

Pipeline Tasks Colab
pipeline_dance_diffusion.py Unconditional Audio Generation -


class diffusers.DanceDiffusionPipeline

< >

( unet scheduler )


  • unet (UNet1DModel) — U-Net architecture to denoise the encoded image.
  • scheduler (SchedulerMixin) — A scheduler to be used in combination with unet to denoise the encoded image. Can be one of IPNDMScheduler.

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)


< >

( batch_size: int = 1 num_inference_steps: int = 100 generator: typing.Optional[torch._C.Generator] = None audio_length_in_s: typing.Optional[float] = None return_dict: bool = True ) → AudioPipelineOutput or tuple


  • batch_size (int, optional, defaults to 1) — The number of audio samples to generate.
  • num_inference_steps (int, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality audio sample at the expense of slower inference.
  • generator (torch.Generator, optional) — A torch generator to make generation deterministic.
  • audio_length_in_s (float, optional, defaults to self.unet.config.sample_size/self.unet.config.sample_rate) — The length of the generated audio sample in seconds. Note that the output of the pipeline, i.e. sample_size, will be audio_length_in_s * self.unet.sample_rate.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a AudioPipelineOutput instead of a plain tuple.


AudioPipelineOutput or tuple

~pipelines.utils.AudioPipelineOutput if return_dict is True, otherwise a `tuple. When returning a tuple, the first element is a list with the generated images.