DistilWav2Vec2 Adult/Child Speech Classifier 37M

DistilWav2Vec2 Adult/Child Speech Classifier is an audio classification model based on the wav2vec 2.0 architecture. This model is a distilled version of wav2vec2-adult-child-cls on a private adult/child speech classification dataset.

This model was trained using HuggingFace's PyTorch framework. All training was done on a Tesla P100, provided by Kaggle. Training metrics were logged via Tensorboard.

Model

Model	#params	Arch.	Training/Validation data (text)
`distil-wav2vec2-adult-child-cls-37m`	37M	wav2vec 2.0	Adult/Child Speech Classification Dataset

Evaluation Results

The model achieves the following results on evaluation:

Dataset	Loss	Accuracy	F1
Adult/Child Speech Classification	0.1431	95.89%	0.9624

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.2586	1.0	96	0.2257	0.9298	0.9363
0.1917	2.0	192	0.1743	0.9460	0.9500
0.1568	3.0	288	0.1701	0.9511	0.9545
0.0965	4.0	384	0.1501	0.9548	0.9584
0.1179	5.0	480	0.1431	0.9589	0.9624

Disclaimer

Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.

Authors

DistilWav2Vec2 Adult/Child Speech Classifier was trained and evaluated by Ananto Joyoadikusumo. All computation and development are done on Kaggle.

Framework versions

Transformers 4.16.2
Pytorch 1.10.2+cu102
Datasets 1.18.3
Tokenizers 0.10.3

bookbot
/

distil-wav2vec2-adult-child-cls-37m