Edit model card

ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1

This model is a fine-tuned version of MIT/ast-finetuned-audioset-10-10-0.4593 on a subset of ashraq/esc50 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7391
  • Accuracy: 0.9286
  • Precision: 0.9449
  • Recall: 0.9286
  • F1: 0.9244

Training and evaluation data

Training and evaluation data were augmented with audiomentations GitHub: iver56/audiomentations library and the following augmentation methods have been performed based on previous experiments Elliott et al.: Tiny transformers for audio classification at the edge:

Gain

  • each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability

Noise

  • a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability

Speed adjust

  • duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability

Pitch shift

  • pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability

Time masking

  • a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
9.9002 1.0 28 8.5662 0.0 0.0 0.0 0.0
5.7235 2.0 56 4.3990 0.0357 0.0238 0.0357 0.0286
2.4076 3.0 84 2.2972 0.4643 0.7405 0.4643 0.4684
1.4448 4.0 112 1.3975 0.7143 0.7340 0.7143 0.6863
0.8373 5.0 140 1.0468 0.8571 0.8524 0.8571 0.8448
0.7239 6.0 168 0.8518 0.8929 0.9164 0.8929 0.8766
0.6504 7.0 196 0.7391 0.9286 0.9449 0.9286 0.9244
0.535 8.0 224 0.6682 0.9286 0.9449 0.9286 0.9244
0.4237 9.0 252 0.6443 0.9286 0.9449 0.9286 0.9244
0.3709 10.0 280 0.6304 0.9286 0.9449 0.9286 0.9244

Test results

Parameter Value
test_loss 0.5829914808273315
test_accuracy 0.9285714285714286
test_precision 0.9446428571428571
test_recall 0.9285714285714286
test_f1 0.930292723149866
test_runtime (s) 4.1488
test_samples_per_second 6.749
test_steps_per_second 3.374
epoch 10.0

Framework versions

  • Transformers 4.27.4
  • Pytorch 2.0.0
  • Datasets 2.10.1
  • Tokenizers 0.13.2
Downloads last month
2