library_name: tf-keras
tags:
- SpeakerRecognition
- Fast Fourier Transform (FFT)
- Convnet
- speech-recordings
- SpeechClassification
Model description
This model helps to classify speakers from the frequency domain representation of speech recordings, obtained via Fast Fourier Transform (FFT). The model is created by a 1D convolutional network with residual connections for audio classification.
This repo contains the model for the notebook Speaker Recognition.
Full credits go to Fadi Badine
Dataset Used
This model uses a speaker recognition dataset of Kaggle
Intended uses & limitations
This should be run with TensorFlow 2.3
or higher, or tf-nightly
.
Also, The noise samples in the dataset need to be resampled to a sampling rate of 16000 Hz before using for this model so, In order to do this, you will need to have installed ffmpg
.
Training and evaluation data
During dataset preparation, the speech samples & background noise samples were sorted and categorized into 2 folders - audio & noise, and then noise samples were resampled to 16000Hz & then the background noise was added to the speech samples to augment the data. After that, the FFT of these samples was given to the model for the training & evaluation part.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
name | learning_rate | decay | beta_1 | beta_2 | epsilon | amsgrad | training_precision |
---|---|---|---|---|---|---|---|
Adam | 0.0010000000474974513 | 0.0 | 0.8999999761581421 | 0.9990000128746033 | 1e-07 | False | float32 |