metadata

title: Music Descriptor
emoji: 🚀
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.29.0
app_file: app.py
pinned: true
license: cc-by-nc-4.0

Demo Introduction

This is an example of using the MERT-v1-95M model as backbone to conduct multiple music understanding tasks with the universal representation.

The tasks include EMO, GS, MTGInstrument, MTGGenre, MTGTop50, MTGMood, NSynthI, NSynthP, VocalSetS, VocalSetT. More models can be referred at the map organization page.

Known Issues

Audio Format Support

Theorectically, all the audio formats supported by torchaudio.load() can be used in the demo. Theese should include but not limited to WAV, AMB, MP3, FLAC.

Audio Input Length

Due the hardware limitation of the machine hosting this demo (2 CPU and 16GB RAM) only the first 4 seconds of audio are used!

This issue is expected to solve in the future by applying more community-support GPU resources or using other audio encoding strategies.

In the current stage, if you want to directly run the demo with longer audios, you could clone this space and deploy with GPU. The code will automatically use GPU for inference if there is GPU that can be detected by torch.cuda.is_available().