File size: 2,502 Bytes
2eedd93
 
 
fa2f032
2eedd93
fa2f032
2eedd93
5c3a4f4
2eedd93
 
 
3d33521
 
 
 
fa2f032
 
 
2eedd93
5c3a4f4
2eedd93
3d33521
fa2f032
3d33521
2eedd93
0470f26
 
705e6a7
0470f26
705e6a7
0470f26
 
fa2f032
 
5c3a4f4
 
 
 
 
 
 
 
 
ebe681a
5c3a4f4
 
2eedd93
3d33521
 
 
 
 
 
 
 
 
 
 
fa2f032
 
 
d85c955
 
 
 
 
 
 
 
 
5c3a4f4
 
ef64a25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: apache-2.0
---
## UK & Ireland Accent Classification Model

This model classifies UK & Ireland accents using feature extraction from [Yamnet](https://tfhub.dev/google/yamnet/1).

### Yamnet Model
Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.   
As output, the model returns a 3-tuple:   
- Scores of shape `(N, 521)` representing the scores of the 521 classes.
- Embeddings of shape `(N, 1024)`.
- The log-mel spectrogram of the entire audio frame.

We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.  

For more detailed information about Yamnet, please refer to its [TensorFlow Hub](https://tfhub.dev/google/yamnet/1) page.

### Dense Model
The dense model that we used consists of:
- An input layer which is embedding output of the Yamnet classifier.
- 4 dense hidden layers and 4 dropout layers.
- An output dense layer.

<details>
<summary>View Model Plot</summary>

![Model Image](./model.png)

</details>

---
## Results
The model achieved the following results:

Results    | Training  | Validation 
-----------|-----------|------------
Accuracy   | 55%       | 51%
AUC        | 0.9090    | 0.8911 
d-prime    | 1.887     | 1.743 

And the confusion matrix for the validation set is:
![Validation Confusion Matrix](./confusion_matrix.png)

---
## Dataset

The dataset used is the
[Crowdsourced high-quality UK and Ireland English Dialect speech data set](https://openslr.org/83/)
which consists of a total of 17,877 high-quality audio wav files.

This dataset includes over 31 hours of recording from 120 vounteers who self-identify as
native speakers of Southern England, Midlands, Northern England, Wales, Scotland and Ireland.

For more info, please refer to the above link or to the following paper:
[Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)

---
## How to use

Having already installed `huggingface_hub` using:

`pip install -U -q huggingface_hub`

Use the following in your code:

`from huggingface_hub import from_pretrained_keras`   
`model = from_pretrained_keras("fbadine/uk_ireland_accent_classification")`

---
## Demo
A demo is available in [HuggingFace Spaces](https://huggingface.co/spaces/fbadine/uk_ireland_accent_classification)