Spaces:
Runtime error
Note: The examples provides may not work on Safari, tablets and iOS devices. Try an alternate approach.
Dataset
Audio files
Files are converted to melspectrograms that perform better in general for visual transformations of such audio files.
Training
Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.
epoch train_loss valid_loss accuracy time 0 1.462791 0.710250 0.775487 01:12
epoch train_loss valid_loss accuracy time 0 0.600056 0.309964 0.892325 00:40 1 0.260431 0.200901 0.945017 00:39 2 0.090158 0.164748 0.950745 00:40
Classical Approaches
Classical approaches on this dataset as of 2019
State of the Art Approaches
The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common(https://scottmduda.medium.com/urban-environmental-audio-classification-using-mel-spectrograms-706ee6f8dcc1) transformation approaches are:
Linear Spectrograms Log Spectrograms Mel Spectrograms
Credits
Thanks to Kurian Benoy and countless others that generously leave code public.