Librarian Bot: Add base_model information to model

19896b0 11 months ago

3.89 kB

	---
	license: bsd-3-clause
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	base_model: MIT/ast-finetuned-audioset-10-10-0.4593
	model-index:
	- name: ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1
	results: []
	---

	# ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1

	This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on a subset of [ashraq/esc50](https://huggingface.co/datasets/ashraq/esc50) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7391
	- Accuracy: 0.9286
	- Precision: 0.9449
	- Recall: 0.9286
	- F1: 0.9244

	## Training and evaluation data

	Training and evaluation data were augmented with audiomentations [GitHub: iver56/audiomentations](https://github.com/iver56/audiomentations) library and the following augmentation methods have been performed based on previous experiments [Elliott et al.: Tiny transformers for audio classification at the edge](https://arxiv.org/pdf/2103.12157.pdf):

	Gain
	- each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability

	Noise
	- a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability

	Speed adjust
	- duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability

	Pitch shift
	- pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability

	Time masking
	- a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability


	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-06
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Precision \| Recall \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:---------:\|:------:\|:------:\|
	\| 9.9002 \| 1.0 \| 28 \| 8.5662 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \|
	\| 5.7235 \| 2.0 \| 56 \| 4.3990 \| 0.0357 \| 0.0238 \| 0.0357 \| 0.0286 \|
	\| 2.4076 \| 3.0 \| 84 \| 2.2972 \| 0.4643 \| 0.7405 \| 0.4643 \| 0.4684 \|
	\| 1.4448 \| 4.0 \| 112 \| 1.3975 \| 0.7143 \| 0.7340 \| 0.7143 \| 0.6863 \|
	\| 0.8373 \| 5.0 \| 140 \| 1.0468 \| 0.8571 \| 0.8524 \| 0.8571 \| 0.8448 \|
	\| 0.7239 \| 6.0 \| 168 \| 0.8518 \| 0.8929 \| 0.9164 \| 0.8929 \| 0.8766 \|
	\| 0.6504 \| 7.0 \| 196 \| 0.7391 \| 0.9286 \| 0.9449 \| 0.9286 \| 0.9244 \|
	\| 0.535 \| 8.0 \| 224 \| 0.6682 \| 0.9286 \| 0.9449 \| 0.9286 \| 0.9244 \|
	\| 0.4237 \| 9.0 \| 252 \| 0.6443 \| 0.9286 \| 0.9449 \| 0.9286 \| 0.9244 \|
	\| 0.3709 \| 10.0 \| 280 \| 0.6304 \| 0.9286 \| 0.9449 \| 0.9286 \| 0.9244 \|

	### Test results
	\| Parameter \| Value \|
	\|:------------------------:\|:------------------:\|
	\| test_loss \| 0.5829914808273315 \|
	\| test_accuracy \| 0.9285714285714286 \|
	\| test_precision \| 0.9446428571428571 \|
	\| test_recall \| 0.9285714285714286 \|
	\| test_f1 \| 0.930292723149866 \|
	\| test_runtime (s) \| 4.1488 \|
	\| test_samples_per_second \| 6.749 \|
	\| test_steps_per_second \| 3.374 \|
	\| epoch \| 10.0 \|

	### Framework versions

	- Transformers 4.27.4
	- Pytorch 2.0.0
	- Datasets 2.10.1
	- Tokenizers 0.13.2