Dataset for this - https://urbansounddataset.weebly.com/urbansound8k.html Classical approaches on this dataset as of 2019 - https://www.researchgate.net/publication/335862311_Evaluation_of_Classical_Machine_Learning_Techniques_towards_Urban_Sound_Recognition_on_Embedded_Systems #Fast.ai was used to train this classifier with a Resnet34 vision learner with 3 epochs. Audio files converted to Mel Spectrograms that perform better in general for visual transformations of such audio files. epoch train_loss valid_loss accuracy time 0 1.462791 0.710250 0.775487 01:12 epoch train_loss valid_loss accuracy time 0 0.600056 0.309964 0.892325 00:40 1 0.260431 0.200901 0.945017 00:39 2 0.090158 0.164748 0.950745 00:40