README.md · keras-io/speaker-recognition at d9ede27f4f68091ed2296d74d08fa8010c6af379

metadata

library_name: tf-keras
tags:
  - SpeakerRecognition
  - Fast Fourier Transform (FFT)
  - Convnet
  - speech-recordings
  - SpeechClassification

Model description

This model helps to classify speakers from the frequency domain representation of speech recordings, obtained via Fast Fourier Transform (FFT). The model is created by a 1D convolutional network with residual connections for audio classification.

This repo contains the model for the notebook Speaker Recognition.

Full credits go to Fadi Badine

Dataset Used

This model uses a speaker recognition dataset of Kaggle

Intended uses & limitations

This should be run with TensorFlow 2.3 or higher, or tf-nightly. Also, The noise samples in the dataset need to be resampled to a sampling rate of 16000 Hz before using for this model so, In order to do this, you will need to have installed ffmpg.

Training and evaluation data

During dataset preparation, the speech samples & background noise samples were sorted and categorized into 2 folders - audio & noise, and then noise samples were resampled to 16000Hz & then the background noise was added to the speech samples to augment the data. After that, the FFT of these samples was given to the model for the training & evaluation part.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

name	learning_rate	decay	beta_1	beta_2	epsilon	amsgrad	training_precision
Adam	0.0010000000474974513	0.0	0.8999999761581421	0.9990000128746033	1e-07	False	float32

Model Plot

View Model Plot

Model By : Kavya Bisht