What is the best layer for each task based on your experience?
Which one is best for music genre classification?
Is there something similar for this model as it is in the Music-Descriptor space?
Thank you very much and congratulations for the work!
Very good question!
The optimal layers of the 95M model for each tasks can be referred to the Music-Descriptor space.
Normally, the optimal layer for the large model can also be inferred from the base model, i.e., if the best layer for 95M model is the middle layer (5~7), the best layer for 330M model might be 10~12.
Generally speaking, lower layers contain more low-level acoustic info, like singer identity, instrument timbre, and pitch.
Middle layers are better for middle to high-level tasks, which encode info like chords, genre, key, and emotion.
The layers that are close to the output might be prone to overfit the pre-train objective, thus sub-optimal.
But I suggest you to test it on your own task, and see which layer is the best.