gputrain commited on
Commit
1990861
1 Parent(s): 2d0c540

update doc

Browse files
Files changed (3) hide show
  1. app.py +1 -1
  2. article.md +28 -4
  3. melspectrogram.PNG +0 -0
app.py CHANGED
@@ -50,7 +50,7 @@ with open("article.md") as f:
50
 
51
  interface_options = {
52
  "title": "Urban Sound 8K Classification",
53
- "description": "Fast AI example of using a pre=trained ResNet34 vision model for an audio classification task on the [Urban Sounds](https://urbansounddataset.weebly.com/urbansound8k.html) dataset. ",
54
  #"article": article,
55
  "interpretation": "default",
56
  "layout": "horizontal",
50
 
51
  interface_options = {
52
  "title": "Urban Sound 8K Classification",
53
+ "description": "Fast AI example of using a pre-trained Resnet34 vision model for an audio classification task on the [Urban Sounds](https://urbansounddataset.weebly.com/urbansound8k.html) dataset. ",
54
  #"article": article,
55
  "interpretation": "default",
56
  "layout": "horizontal",
article.md CHANGED
@@ -1,15 +1,39 @@
 
1
 
2
- Dataset for this - https://urbansounddataset.weebly.com/urbansound8k.html
3
 
4
- Classical approaches on this dataset as of 2019 - https://www.researchgate.net/publication/335862311_Evaluation_of_Classical_Machine_Learning_Techniques_towards_Urban_Sound_Recognition_on_Embedded_Systems
5
 
 
6
 
 
7
 
8
- #Fast.ai was used to train this classifier with a Resnet34 vision learner with 3 epochs. Audio files converted to Mel Spectrograms that perform better in general for visual transformations of such audio files.
 
 
9
 
10
  epoch train_loss valid_loss accuracy time
11
  0 1.462791 0.710250 0.775487 01:12
 
12
  epoch train_loss valid_loss accuracy time
13
  0 0.600056 0.309964 0.892325 00:40
14
  1 0.260431 0.200901 0.945017 00:39
15
- 2 0.090158 0.164748 0.950745 00:40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ > Note: The examples provides may not work on Safari, tablets and iOS devices. Try an alternate approach.
2
 
3
+ ## Dataset
4
 
5
+ - [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html)
6
 
7
+ ## Audio files
8
 
9
+ Files are converted to melspectrograms that perform better in general for visual transformations of such audio files.
10
 
11
+ ## Training
12
+
13
+ Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.
14
 
15
  epoch train_loss valid_loss accuracy time
16
  0 1.462791 0.710250 0.775487 01:12
17
+
18
  epoch train_loss valid_loss accuracy time
19
  0 0.600056 0.309964 0.892325 00:40
20
  1 0.260431 0.200901 0.945017 00:39
21
+ 2 0.090158 0.164748 0.950745 00:40
22
+
23
+ # Classical Approaches
24
+
25
+ [Classical approaches on this dataset as of 2019](https://www.researchgate.net/publication/335862311_Evaluation_of_Classical_Machine_Learning_Techniques_towards_Urban_Sound_Recognition_on_Embedded_Systems)
26
+
27
+ ## State of the Art Approaches
28
+
29
+ The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common(https://scottmduda.medium.com/urban-environmental-audio-classification-using-mel-spectrograms-706ee6f8dcc1)
30
+ transformation approaches are:
31
+
32
+ Linear Spectrograms
33
+ Log Spectrograms
34
+ [Mel Spectrograms](https://towardsdatascience.com/audio-deep-learning-made-simple-part-2-why-mel-spectrograms-perform-better-aad889a93505)
35
+
36
+
37
+ ## Credits
38
+
39
+ Thanks to [Kurian Benoy](https://kurianbenoy.com/) and countless others that generously leave code public.
melspectrogram.PNG ADDED