The Musika autoencoder consists of two hierarchical stages that are separately trained. This autoencoder is trained to encode and reconstruct general 44.1 kHz waveform music. The final time compression ratio that is achieved is 4096x. As an example, 23 seconds of 44.1 kHz audio are encoded into a sequence of 256 vectors with a dimension of 64.
This autoencoder is automatically downloaded and used at the first execution of the system. Try Musika here!
The autoencoder was trained on both the SXSW dataset (diverse music dataset) and on the VCTK dataset (speech dataset) to produce general representations.
- Downloads last month