File size: 1,740 Bytes
1990861
624fa35
1990861
624fa35
1990861
624fa35
1990861
624fa35
1990861
624fa35
1990861
 
 
624fa35
 
 
1990861
624fa35
 
 
1990861
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
> Note: The examples provides may not work on Safari, tablets and iOS devices. Try an alternate approach. 

## Dataset 

- [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html)

## Audio files 

Files are converted to melspectrograms that perform better in general for visual transformations of such audio files. 

## Training 

Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.

epoch	train_loss	valid_loss	accuracy	time
0	1.462791	0.710250	0.775487	01:12

epoch	train_loss	valid_loss	accuracy	time
0	0.600056	0.309964	0.892325	00:40
1	0.260431	0.200901	0.945017	00:39
2	0.090158	0.164748	0.950745	00:40

# Classical Approaches

[Classical approaches on this dataset as of 2019](https://www.researchgate.net/publication/335862311_Evaluation_of_Classical_Machine_Learning_Techniques_towards_Urban_Sound_Recognition_on_Embedded_Systems)

## State of the Art Approaches 

The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common(https://scottmduda.medium.com/urban-environmental-audio-classification-using-mel-spectrograms-706ee6f8dcc1)
 transformation approaches are:

Linear Spectrograms
Log Spectrograms
[Mel Spectrograms](https://towardsdatascience.com/audio-deep-learning-made-simple-part-2-why-mel-spectrograms-perform-better-aad889a93505)


## Credits 

Thanks to [Kurian Benoy](https://kurianbenoy.com/) and countless others that generously leave code public.