fbadine's picture
Update README.md
2eedd93
metadata
license: apache-2.0

UK & Ireland Accent Classification Model

This is a model to classify and identify the accent of a UK or Ireland speaker among one of the following accents:

  • Irish English
  • Midlands English
  • Northern English
  • Scottish English
  • Southern English
  • Welsh English

The model implements transfer learning feature extraction using Yamnet model in order to train a model.

Yamnet Model

Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub. Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
As output, the model returns a 3-tuple:

  • scores of shape (N, 521) representing the scores of the 521 classes
  • embeddings of shape (N, 1024)
  • log_mel spectrogram representing the log-mel spectrogram of the entire audio frame We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.

Dense Model

The dense model that we used consists of:

  • An input layer which is embedding output of the Yamnet model
  • 4 Dense hidden layers and 4 Dropout layers
  • An output dense layer

Dataset

The dataset used is the Open-source Multi-speaker Corpora of the English Accents in the British Isles which consists of a total of 17,877 audio files.

Dataset Info

@inproceedings{demirsahin-etal-2020-open,
title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}},
author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
month = may,
year = {2020},
pages = {6532--6541},
address = {Marseille, France},
publisher = {European Language Resources Association (ELRA)},
url = {https://www.aclweb.org/anthology/2020.lrec-1.804},\n\ ISBN = {979-10-95546-34-4},
}

Demo

A demo is available in HuggingFace Spaces ...