Spaces:
Runtime error
Runtime error
> Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser. | |
During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before. | |
[You can find how I trained this music genre classification using fast.ai in this blogpost.](https://kurianbenoy.com/ml-blog/fastai/fastbook/2022/05/01/AudioCNNDemo.html). | |
## Dataset | |
1. [The competition data](https://www.kaggle.com/competitions/kaggle-pog-series-s01e02/data) | |
2. [Image data generated from converting audio to melspectograms in form of images](https://www.kaggle.com/datasets/dienhoa/music-genre-spectrogram-pogchamps) | |
## Training | |
Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs. | |
| epoch | train_loss | valid_loss | error_rate | time | | |
|-------|---------------|---------------|---------------|-------| | |
|0 | 2.312176 | 1.843815 | 0.558654 | 02:07 | | |
|1 | 2.102361 | 1.719162 | 0.539061 | 02:08 | | |
|2 | 1.867139 | 1.623988 | 0.527003 | 02:08 | | |
|3 | 1.710557 | 1.527913 | 0.507661 | 02:07 | | |
|4 | 1.629478 | 1.456836 | 0.479779 | 02:05 | | |
|5 | 1.519305 | 1.433036 | 0.474253 | 02:05 | | |
|6 | 1.457465 | 1.379757 | 0.464456 | 02:05 | | |
|7 | 1.396283 | 1.369344 | 0.457925 | 02:05 | | |
|8 | 1.359388 | 1.367973 | 0.453655 | 02:05 | | |
|9 | 1.364363 | 1.368887 | 0.456167 | 02:04 | | |
## Examples | |
The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training. | |
## Credits | |
Thanks [Dien Hoa Truong](https://twitter.com/DienhoaT) for providing [inference code](https://www.kaggle.com/code/dienhoa/inference-submission-music-genre) for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction. | |
Thanks [@suvash](https://twitter.com/suvash) for helping me get started with huggingface | |
spaces and for his [excellent space](https://huggingface.co/spaces/suvash/food-101-resnet50) which was a reference for this work. | |
Thanks [@strickvl](https://twitter.com/strickvl) for reporting issue in safari browser | |
and trying this space out. | |