luisotorres commited on
Commit
77a9118
1 Parent(s): ee669e4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - marsyas/gtzan
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: audio-classification
7
+ tags:
8
+ - music
9
+ - audio
10
+ ---
11
+
12
+ # Description
13
+
14
+ This model is a specialized version of the <b>distilhubert</b> model fine-tuned on the <b>gtzan</b> dataset for the task of Music Genre Classification.
15
+
16
+ ## Development
17
+ - Kaggle Notebook: [Audio Data: Music Genre Classification](https://www.kaggle.com/code/lusfernandotorres/audio-data-music-genre-classification)
18
+
19
+
20
+ ## Training Parameters
21
+ ```python
22
+ evaluation_strategy = 'epoch',
23
+ save_strategy = 'epoch',
24
+ load_best_model_at_end = True,
25
+ metric_for_best_model = 'accuracy',
26
+ learning_rate = 5e-5,
27
+ seed = 42,
28
+ per_device_train_batch_size = 8,
29
+ per_device_eval_batch_size = 8,
30
+ gradient_accumulation_steps = 1,
31
+ num_train_epochs = 15,
32
+ warmup_ratio = 0.1,
33
+ fp16 = True,
34
+ save_total_limit = 2,
35
+ report_to = 'none'
36
+ ```
37
+
38
+ ## Training and Validation Results
39
+
40
+ ```python
41
+ Epoch Training Loss Validation Loss Accuracy
42
+ 1 No log 2.050576 0.395000
43
+ 2 No log 1.387915 0.565000
44
+ 3 No log 1.141497 0.665000
45
+ 4 No log 1.052763 0.675000
46
+ 5 1.354600 0.846402 0.745000
47
+ 6 1.354600 0.858698 0.750000
48
+ 7 1.354600 0.864531 0.730000
49
+ 8 1.354600 0.765039 0.775000
50
+ 9 1.354600 0.790847 0.785000
51
+ 10 0.250100 0.873926 0.785000
52
+ 11 0.250100 0.928275 0.770000
53
+ 12 0.250100 0.851429 0.780000
54
+ 13 0.250100 0.922214 0.770000
55
+ 14 0.250100 0.916481 0.780000
56
+ 15 0.028000 0.946075 0.770000
57
+ TrainOutput(global_step=1500, training_loss=0.5442592652638754,
58
+ metrics={'train_runtime': 12274.2966, 'train_samples_per_second': 0.976,
59
+ 'train_steps_per_second': 0.122, 'total_flos': 8.177513845536e+17, 'train_loss': 0.5442592652638754, 'epoch': 15.0})
60
+ ```
61
+
62
+ ## Reference
63
+ This model is based on the original <b>HuBERT</b> architecture, as detailed in:
64
+
65
+ Hsu et al. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. [arXiv:2106.07447](https://arxiv.org/pdf/2106.07447.pdf)