Musika Audio Autoencoder

Pretrained universal autoencoder model for the Musika system for fast infinite waveform music generation. Introduced in this paper.

Model description

The Musika autoencoder consists of two hierarchical stages that are separately trained. This autoencoder is trained to encode and reconstruct general 44.1 kHz waveform music. The final time compression ratio that is achieved is 4096x. As an example, 23 seconds of 44.1 kHz audio are encoded into a sequence of 256 vectors with a dimension of 64.

How to use

This autoencoder is automatically downloaded and used at the first execution of the system. Try Musika here!

Training data

The autoencoder was trained on both the SXSW dataset (diverse music dataset) and on the VCTK dataset (speech dataset) to produce general representations.