fbadine commited on
Commit
3d33521
1 Parent(s): 705e6a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -20
README.md CHANGED
@@ -17,16 +17,17 @@ The model implements transfer learning feature extraction using [Yamnet](https:/
17
  Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
18
  Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
19
  As output, the model returns a 3-tuple:
20
- - scores of shape (N, 521) representing the scores of the 521 classes
21
- - embeddings of shape (N, 1024)
22
- - log_mel spectrogram representing the log-mel spectrogram of the entire audio frame
 
23
  We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
24
 
25
  ## Dense Model
26
  The dense model that we used consists of:
27
- - An input layer which is embedding output of the Yamnet model
28
- - 4 Dense hidden layers and 4 Dropout layers
29
- - An output dense layer
30
 
31
  <details>
32
  <summary>View Model Plot</summary>
@@ -36,20 +37,17 @@ The dense model that we used consists of:
36
  </details>
37
 
38
  ## Dataset
39
- The dataset used is the **[Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://openslr.org/83/)** which consists of a total of **17,877 audio files**.
40
-
41
- ### Dataset Info
42
- @inproceedings{demirsahin-etal-2020-open,
43
- title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}},
44
- author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara},
45
- booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
46
- month = may,
47
- year = {2020},
48
- pages = {6532--6541},
49
- address = {Marseille, France},
50
- publisher = {European Language Resources Association (ELRA)},
51
- url = {https://www.aclweb.org/anthology/2020.lrec-1.804},\n\ ISBN = {979-10-95546-34-4},
52
- }
53
 
54
  # Demo
55
  A demo is available in HuggingFace Spaces ...
 
17
  Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
18
  Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
19
  As output, the model returns a 3-tuple:
20
+ - Scores of shape `(N, 521)` representing the scores of the 521 classes.
21
+ - Embeddings of shape `(N, 1024)`.
22
+ - The log-mel spectrogram of the entire audio frame.
23
+
24
  We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
25
 
26
  ## Dense Model
27
  The dense model that we used consists of:
28
+ - An input layer which is embedding output of the Yamnet classifier.
29
+ - 4 dense hidden layers and 4 dropout layers
30
+ - An output dense layer.
31
 
32
  <details>
33
  <summary>View Model Plot</summary>
 
37
  </details>
38
 
39
  ## Dataset
40
+
41
+ The dataset used is the
42
+ [Crowdsourced high-quality UK and Ireland English Dialect speech data set](https://openslr.org/83/)
43
+ which consists of a total of 17,877 high-quality audio wav files.
44
+
45
+ This dataset includes over 31 hours of recording from 120 vounteers who self-identify as
46
+ native speakers of Southern England, Midlands, Northern England, Wales, Scotland and Ireland.
47
+
48
+ For more info, please refer to the above link or to the following paper:
49
+ [Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)
50
+
 
 
 
51
 
52
  # Demo
53
  A demo is available in HuggingFace Spaces ...