wav2vec2-base-one-shot-hip-hop-drums-clf

This model is a fine-tuned version of facebook/wav2vec2-base on yojul/one-shot-hip-hop-drums. It achieves the following results on the evaluation set:

Loss: 0.2463
Accuracy: 0.9243

Model description

This a model is a classifier of one-shot drum sample, it has been trained on 17k hip-hop drum samples. It is able to classify samples within 7 classes : Kicks, Snares, Cymbals, Open-hats, Hi-hats, 808s, Claps.

Intended uses & limitations

It might be used to automatically sort large number of drum samples when there are no prior knowledge on metadata. The model can take any audio file as input, but note that it has been trained on audio files downsampled at 16kHz.

Training and evaluation data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.8432	1.0	123	0.7449	0.8523
0.4692	2.0	246	0.4199	0.8894
0.3478	3.0	369	0.3122	0.9148
0.3054	4.0	492	0.2771	0.9156
0.2522	5.0	615	0.2676	0.9217
0.2221	6.0	738	0.2495	0.9217
0.2256	7.0	861	0.2588	0.9184
0.1949	8.0	984	0.2525	0.9232
0.1837	9.0	1107	0.2505	0.9237
0.1644	10.0	1230	0.2463	0.9243

Framework versions

Transformers 4.41.1
Pytorch 2.3.0+cu121
Datasets 2.19.0
Tokenizers 0.19.1

yojul
/

wav2vec2-base-one-shot-hip-hop-drums-clf