Model Size Sound Music Speech
ESC-50 US8K FSD50K VIVAE FMA MTT IRMAS MS-DB RAVDESS A-MNIST SLURP EMOVO
W2V2 S 45.73 55.48 19.39 31.47 50.54 37.56 35.14 66.06 55.32 86.38 14.37 31.80
WavLM S 49.88 61.84 17.63 36.31 48.71 34.93 32.62 54.18 67.94 99.50 30.98 43.08
WavLM+ S 58.73 64.07 21.57 36.17 56.17 38.24 35.76 57.51 52.20 99.63 28.06 36.73
HuBERT S 58.90 67.28 24.53 40.48 54.63 38.78 36.65 58.46 65.28 99.58 33.75 40.48
D2V S 23.63 45.63 10.06 30.19 40.58 27.60 25.87 50.74 48.03 99.06 43.57 27.27
W2V2-A S 49.48 62.34 21.44 34.90 59.25 36.13 34.07 68.74 51.50 75.13 11.01 31.01
W2V2 M 13.13 42.70 5.80 22.01 41.71 20.95 19.91 50.23 11.57 45.74 7.33 19.27
XLS-R M 51.28 69.96 23.71 36.28 56.96 38.28 38.42 66.71 31.48 98.88 12.74 20.35
WavLM M 67.20 70.92 32.21 42.51 61.13 41.29 42.53 68.00 71.76 99.75 42.34 45.29
HuBERT M 63.98 70.00 29.51 40.95 54.79 38.36 36.81 64.08 72.57 99.95 45.26 43.76
D2V M 25.35 49.15 10.82 30.57 43.46 28.52 27.08 44.20 45.14 99.15 28.60 23.07
XLS-R L 66.95 75.90 31.61 40.41 62.79 41.99 43.57 69.79 55.44 99.86 25.14 34.58
HuBERT L 63.40 69.66 29.32 42.72 56.25 37.76 37.30 64.71 75.69 99.95 47.81 47.17