fbadine commited on
Commit
5c3a4f4
1 Parent(s): 3d33521

Update README.md

Browse files

Adding a result section and adjusting a bit the layout

Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -13,7 +13,7 @@ This is a model to classify and identify the accent of a UK or Ireland speaker a
13
 
14
  The model implements transfer learning feature extraction using [Yamnet](https://tfhub.dev/google/yamnet/1) model in order to train a model.
15
 
16
- ## Yamnet Model
17
  Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
18
  Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
19
  As output, the model returns a 3-tuple:
@@ -23,7 +23,7 @@ As output, the model returns a 3-tuple:
23
 
24
  We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
25
 
26
- ## Dense Model
27
  The dense model that we used consists of:
28
  - An input layer which is embedding output of the Yamnet classifier.
29
  - 4 dense hidden layers and 4 dropout layers
@@ -36,6 +36,19 @@ The dense model that we used consists of:
36
 
37
  </details>
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ## Dataset
40
 
41
  The dataset used is the
@@ -48,6 +61,6 @@ native speakers of Southern England, Midlands, Northern England, Wales, Scotland
48
  For more info, please refer to the above link or to the following paper:
49
  [Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)
50
 
51
-
52
- # Demo
53
  A demo is available in HuggingFace Spaces ...
 
13
 
14
  The model implements transfer learning feature extraction using [Yamnet](https://tfhub.dev/google/yamnet/1) model in order to train a model.
15
 
16
+ ### Yamnet Model
17
  Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
18
  Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
19
  As output, the model returns a 3-tuple:
 
23
 
24
  We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
25
 
26
+ ### Dense Model
27
  The dense model that we used consists of:
28
  - An input layer which is embedding output of the Yamnet classifier.
29
  - 4 dense hidden layers and 4 dropout layers
 
36
 
37
  </details>
38
 
39
+ ### Results
40
+ The model achieved the following results:
41
+
42
+ Results | Training | Validation
43
+ -----------|-----------|------------
44
+ Accuracy | 55% | 51%
45
+ AUC | 0.9090 | 0.8911
46
+ d-prime | 1.887 | 1.743
47
+
48
+ And the confusion matrix for the validation set is:
49
+ ![Model Image](./confusion_matrix.png)
50
+
51
+ ---
52
  ## Dataset
53
 
54
  The dataset used is the
 
61
  For more info, please refer to the above link or to the following paper:
62
  [Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)
63
 
64
+ ---
65
+ ## Demo
66
  A demo is available in HuggingFace Spaces ...