Edit model card

distilhubert-finetuned-gtzan

This model is a fine-tuned version of ntu-spml/distilhubert on the GTZAN dataset. It achieves the following results on the evaluation set on best epoch:

  • Loss: 0.7305
  • Accuracy: 0.9

Model description

Distilhubert is distilled version of the HuBERT and pretrained on data set with 16k frequency.
Architecture of this model is CTC or Connectionist Temporal Classification is a technique that is used with encoder-only transformer.

Training and evaluation data

Training + Evaluation data set is GTZAN which is a popular dataset of 999 songs for music genre classification.
Each song is a 30-second clip from one of 10 genres of music, spanning disco to metal.
Train set is 899 songs and Evaluation set is 100 songs remainings.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 35
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
2.1728 1.0 225 2.0896 0.42
1.4211 2.0 450 1.4951 0.55
1.2155 3.0 675 1.0669 0.72
1.0175 4.0 900 0.8862 0.69
0.3516 5.0 1125 0.6265 0.83
0.6135 6.0 1350 0.6485 0.78
0.0807 7.0 1575 0.6567 0.78
0.0303 8.0 1800 0.7615 0.83
0.2663 9.0 2025 0.6612 0.86
0.0026 10.0 2250 0.8354 0.85
0.0337 11.0 2475 0.6768 0.87
0.0013 12.0 2700 0.7718 0.87
0.001 13.0 2925 0.7570 0.88
0.0008 14.0 3150 0.8170 0.89
0.0006 15.0 3375 0.7920 0.89
0.0005 16.0 3600 0.9859 0.83
0.0004 17.0 3825 0.8190 0.9
0.0003 18.0 4050 0.7305 0.9
0.0003 19.0 4275 0.8025 0.88
0.0002 20.0 4500 0.8208 0.87
0.0003 21.0 4725 0.7358 0.88
0.0002 22.0 4950 0.8681 0.87
0.0002 23.0 5175 0.7831 0.9
0.0003 24.0 5400 0.8583 0.88
0.0002 25.0 5625 0.8138 0.88
0.0002 26.0 5850 0.7871 0.89
0.0002 27.0 6075 0.8893 0.88
0.0002 28.0 6300 0.8284 0.89
0.0001 29.0 6525 0.8388 0.89
0.0001 30.0 6750 0.8305 0.9
0.0001 31.0 6975 0.8377 0.88
0.0153 32.0 7200 0.8496 0.88
0.0001 33.0 7425 0.8381 0.88
0.0001 34.0 7650 0.8440 0.88
0.0001 35.0 7875 0.8458 0.88

Framework versions

  • Transformers 4.29.2
  • Pytorch 1.13.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
43

Dataset used to train WasuratS/distilhubert-finetuned-gtzan