What is the best layer for each task based on your experience?

by elloza - opened Aug 9, 2023

Discussion

elloza

Aug 9, 2023

Which one is best for music genre classification?

Is there something similar for this model as it is in the Music-Descriptor space?

Thank you very much and congratulations for the work!

yizhilll

Multimodal Art Projection org Sep 6, 2023

•

edited Sep 6, 2023

Very good question!

The optimal layers of the 95M model for each tasks can be referred to the Music-Descriptor space.

Normally, the optimal layer for the large model can also be inferred from the base model, i.e., if the best layer for 95M model is the middle layer (5~7), the best layer for 330M model might be 10~12.

yizhilll changed discussion status to closed Sep 6, 2023

a43992899

Multimodal Art Projection org Sep 15, 2023

Generally speaking, lower layers contain more low-level acoustic info, like singer identity, instrument timbre, and pitch.

Middle layers are better for middle to high-level tasks, which encode info like chords, genre, key, and emotion.

The layers that are close to the output might be prone to overfit the pre-train objective, thus sub-optimal.

But I suggest you to test it on your own task, and see which layer is the best.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment