xpariz10
/

ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1

@@ -12,9 +12,6 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1
 This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on a subset of [ashraq/esc50](https://huggingface.co/datasets/ashraq/esc50) dataset.
@@ -25,19 +22,20 @@ It achieves the following results on the evaluation set:
 - Recall: 0.9286
 - F1: 0.9244
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -68,6 +66,18 @@ The following hyperparameters were used during training:
 | 0.4237        | 9.0   | 252  | 0.6443          | 0.9286   | 0.9449    | 0.9286 | 0.9244 |
 | 0.3709        | 10.0  | 280  | 0.6304          | 0.9286   | 0.9449    | 0.9286 | 0.9244 |
 ### Framework versions
@@ -75,4 +85,3 @@ The following hyperparameters were used during training:
 - Pytorch 2.0.0
 - Datasets 2.10.1
 - Tokenizers 0.13.2

   results: []
 ---
 # ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1
 This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on a subset of [ashraq/esc50](https://huggingface.co/datasets/ashraq/esc50) dataset.
 - Recall: 0.9286
 - F1: 0.9244
 ## Training and evaluation data
+Training and evaluation data were augmented with audiomentations [GitHub: iver56/audiomentations](https://github.com/iver56/audiomentations) library and the following augmentation methods have been performed based on previous experiments [Elliott et al.: Tiny transformers for audio classification at the edge](https://arxiv.org/pdf/2103.12157.pdf):
+Gain
+- each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability
+Noise
+- a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability
+Speed adjust
+- duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability
+Pitch shift
+- pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability
+Time masking
+- a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability
 ### Training hyperparameters
 | 0.4237        | 9.0   | 252  | 0.6443          | 0.9286   | 0.9449    | 0.9286 | 0.9244 |
 | 0.3709        | 10.0  | 280  | 0.6304          | 0.9286   | 0.9449    | 0.9286 | 0.9244 |
+### Test results
+|         Parameter        |        Value       |
+|:------------------------:|:------------------:|
+| test_loss                | 0.5829914808273315 |
+| test_accuracy            | 0.9285714285714286 |
+| test_precision           | 0.9446428571428571 |
+| test_recall              | 0.9285714285714286 |
+| test_f1                  | 0.930292723149866  |
+| test_runtime (s)         | 4.1488             |
+| test_samples_per_second  | 6.749              |
+| test_steps_per_second    | 3.374              |
+| epoch                    | 10.0               |
 ### Framework versions
 - Pytorch 2.0.0
 - Datasets 2.10.1
 - Tokenizers 0.13.2