File size: 2,398 Bytes
34a8f74
 
c46cc0d
ac1ef50
 
cdcfb1f
ac1ef50
 
 
 
 
 
 
 
 
 
 
fd0c36b
 
 
 
 
 
 
 
 
 
 
 
 
 
ac1ef50
 
 
34a8f74
 
 
dea9c3e
bbdbdcc
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
> Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.

During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.


[You can find how I trained this music genre classification using fast.ai in this blogpost.](https://kurianbenoy.com/ml-blog/fastai/fastaicourse/2022/05/01/AudioCNNDemo.html).

## Dataset

1. [The competition data](https://www.kaggle.com/competitions/kaggle-pog-series-s01e02/data)
2. [Image data generated from converting audio to melspectograms in form of images](https://www.kaggle.com/datasets/dienhoa/music-genre-spectrogram-pogchamps)


## Training

Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.

| epoch	| train_loss	| valid_loss	| error_rate	| time  |
|-------|---------------|---------------|---------------|-------|
|0  |	2.312176 |	1.843815 |	0.558654 |	02:07 |
|1  |	2.102361 |	1.719162 |	0.539061 |	02:08 |
|2  |	1.867139 |	1.623988 |	0.527003 |	02:08 |
|3  |	1.710557 |	1.527913 |	0.507661 |	02:07 |
|4  |	1.629478 |	1.456836 |	0.479779 |	02:05 |
|5  |	1.519305 |	1.433036 |	0.474253 |	02:05 |
|6  |	1.457465 |	1.379757 |	0.464456 |	02:05 |
|7  |	1.396283 |	1.369344 |	0.457925 |	02:05 |
|8  |	1.359388 |	1.367973 |	0.453655 |	02:05 |
|9  |	1.364363 |	1.368887 |	0.456167 |	02:04 |


## Examples

The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.

## Credits

Thanks [Dien Hoa Truong](https://twitter.com/DienhoaT) for providing [inference code](https://www.kaggle.com/code/dienhoa/inference-submission-music-genre) for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.

Thanks [@suvash](https://twitter.com/suvash) for helping me get started with huggingface
spaces and for his [excellent space](https://huggingface.co/spaces/suvash/food-101-resnet50) which was a reference for this work.

Thanks [@strickvl](https://twitter.com/strickvl) for reporting issue in safari browser
and trying this space out.