File size: 2,087 Bytes
77a9118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
datasets:
- marsyas/gtzan
metrics:
- accuracy
pipeline_tag: audio-classification
tags:
- music
- audio
---

# Description

This model is a specialized version of the <b>distilhubert</b> model fine-tuned on the <b>gtzan</b> dataset for the task of Music Genre Classification.

## Development
- Kaggle Notebook: [Audio Data: Music Genre Classification](https://www.kaggle.com/code/lusfernandotorres/audio-data-music-genre-classification)


## Training Parameters
```python
evaluation_strategy = 'epoch',
save_strategy = 'epoch',
load_best_model_at_end = True,
metric_for_best_model = 'accuracy',
learning_rate = 5e-5,
seed = 42,
per_device_train_batch_size = 8,
per_device_eval_batch_size = 8,
gradient_accumulation_steps = 1,
num_train_epochs = 15,
warmup_ratio = 0.1,
fp16 = True,
save_total_limit = 2,
report_to = 'none'
```

## Training and Validation Results

```python
Epoch	Training Loss	Validation Loss	Accuracy
1	      No log	        2.050576	0.395000
2	      No log	        1.387915	0.565000
3	      No log	        1.141497	0.665000
4	      No log	        1.052763	0.675000
5	      1.354600	        0.846402	0.745000
6	      1.354600	        0.858698	0.750000
7	      1.354600	        0.864531	0.730000
8	      1.354600	        0.765039	0.775000
9	      1.354600	        0.790847	0.785000
10	      0.250100	        0.873926	0.785000
11	      0.250100	        0.928275	0.770000
12	      0.250100	        0.851429	0.780000
13	      0.250100	        0.922214	0.770000
14	      0.250100	        0.916481	0.780000
15	      0.028000	        0.946075	0.770000
TrainOutput(global_step=1500, training_loss=0.5442592652638754,
metrics={'train_runtime': 12274.2966, 'train_samples_per_second': 0.976,
'train_steps_per_second': 0.122, 'total_flos': 8.177513845536e+17, 'train_loss': 0.5442592652638754, 'epoch': 15.0})
```

## Reference
This model is based on the original <b>HuBERT</b> architecture, as detailed in:

Hsu et al. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. [arXiv:2106.07447](https://arxiv.org/pdf/2106.07447.pdf)