xpariz10
/

ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1

Audio Classification

audio-spectrogram-transformer

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

xpariz10 commited on Apr 3, 2023

Commit

1a45862

•

1 Parent(s): 460f417

Update README.md

Files changed (1) hide show

README.md +10 -5

README.md CHANGED Viewed

@@ -26,17 +26,22 @@ It achieves the following results on the evaluation set:
 Training and evaluation data were augmented with audiomentations [GitHub: iver56/audiomentations](https://github.com/iver56/audiomentations) library and the following augmentation methods have been performed based on previous experiments [Elliott et al.: Tiny transformers for audio classification at the edge](https://arxiv.org/pdf/2103.12157.pdf):
-Gain
 - each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability
-Noise
 - a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability
-Speed adjust
 - duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability
-Pitch shift
 - pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability
-Time masking
 - a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability
 ### Training hyperparameters
 The following hyperparameters were used during training:

 Training and evaluation data were augmented with audiomentations [GitHub: iver56/audiomentations](https://github.com/iver56/audiomentations) library and the following augmentation methods have been performed based on previous experiments [Elliott et al.: Tiny transformers for audio classification at the edge](https://arxiv.org/pdf/2103.12157.pdf):
+#Gain
 - each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability
+#Noise
 - a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability
+#Speed adjust
 - duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability
+#Pitch shift
 - pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability
+#Time masking
 - a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability
 ### Training hyperparameters
 The following hyperparameters were used during training: