Dataset for this - Classical approaches on this dataset as of 2019 - was used to train this classifier with a Resnet34 vision learner with 3 epochs. Audio files converted to Mel Spectrograms that perform better in general for visual transformations of such audio files. epoch train_loss valid_loss accuracy time 0 1.462791 0.710250 0.775487 01:12 epoch train_loss valid_loss accuracy time 0 0.600056 0.309964 0.892325 00:40 1 0.260431 0.200901 0.945017 00:39 2 0.090158 0.164748 0.950745 00:40