Note: The examples provides may not work on Safari, tablets and iOS devices. Try an alternate approach.

Dataset

UrbanSound8K

Audio files

Files are converted to melspectrograms that perform better in general for visual transformations of such audio files.

Training

Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.

epoch train_loss valid_loss accuracy time 0 1.462791 0.710250 0.775487 01:12

epoch train_loss valid_loss accuracy time 0 0.600056 0.309964 0.892325 00:40 1 0.260431 0.200901 0.945017 00:39 2 0.090158 0.164748 0.950745 00:40

Classical Approaches

Classical approaches on this dataset as of 2019

State of the Art Approaches

The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common(https://scottmduda.medium.com/urban-environmental-audio-classification-using-mel-spectrograms-706ee6f8dcc1) transformation approaches are:

Linear Spectrograms Log Spectrograms Mel Spectrograms

Credits

Thanks to Kurian Benoy and countless others that generously leave code public.