speechbrainteam commited on
Commit
a02a15a
1 Parent(s): 1a21ed1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -21
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  - Spoken language understanding
6
  license: "CC0"
7
  datasets:
8
- - Timers and Such
9
  metrics:
10
  - Accuracy
11
 
@@ -14,26 +14,29 @@ metrics:
14
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
15
  <br/><br/>
16
 
 
 
 
17
 
18
- # End-to-end SLU model for Timers and Such
19
-
20
- Attention-based RNN sequence-to-sequence model for [Timers and Such](https://arxiv.org/abs/2104.01604) trained on the `train-real` subset. This model checkpoint achieves 86.7% accuracy on `test-real`.
21
-
22
  The model uses an ASR model trained on LibriSpeech ([`speechbrain/asr-crdnn-rnnlm-librispeech`](https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech)) to extract features from the input audio, then maps these features to an intent and slot labels using a beam search.
23
 
24
- The dataset has four intents: `SetTimer`, `SetAlarm`, `SimpleMath`, and `UnitConversion`. Try testing the model by saying something like "set a timer for 5 minutes" or "what's 32 degrees Celsius in Fahrenheit?"
25
 
26
- You can try the model on the `math.wav` file included here as follows:
27
  ```
28
- from speechbrain.pretrained import EndToEndSLU
29
- slu = EndToEndSLU.from_hparams("speechbrain/slu-timers-and-such-direct-librispeech-asr")
30
- slu.decode_file("speechbrain/slu-timers-and-such-direct-librispeech-asr/math.wav")
 
 
31
  ```
32
  ### Inference on GPU
33
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
34
 
35
  ### Training
36
- The model was trained with SpeechBrain (d254489a).
37
  To train it from scratch follows these steps:
38
  1. Clone SpeechBrain:
39
  ```bash
@@ -48,7 +51,7 @@ pip install -e .
48
 
49
  3. Run Training:
50
  ```
51
- cd recipes/timers-and-such/direct
52
  python train.py hparams/train.yaml --data_folder=your_data_folder
53
  ```
54
 
@@ -66,20 +69,25 @@ title = {SpeechBrain},
66
  year = {2021},
67
  publisher = {GitHub},
68
  journal = {GitHub repository},
69
- howpublished = {\url{https://github.com/speechbrain/speechbrain}},
70
  }
71
  ```
72
 
73
- #### Referencing Timers and Such
74
 
75
  ```
76
- @misc{lugosch2021timers,
77
- title={Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers},
78
- author={Lugosch, Loren and Papreja, Piyush and Ravanelli, Mirco and Heba, Abdelwahab and Parcollet, Titouan},
79
- year={2021},
80
- eprint={2104.01604},
81
- archivePrefix={arXiv},
82
- primaryClass={cs.CL}
 
 
 
 
 
83
  }
84
  ```
85
 
 
5
  - Spoken language understanding
6
  license: "CC0"
7
  datasets:
8
+ - Fluent Speech Commands
9
  metrics:
10
  - Accuracy
11
 
 
14
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
15
  <br/><br/>
16
 
17
+ # Fluent Speech Commands
18
+ The dataset contains real recordings that define a simple spoken language understanding task. You can download it from [here](https://fluent.ai/fluent-speech-commands-a-dataset-for-spoken-language-understanding-research/).
19
+ The Fluent Speech Commands dataset contains 30,043 utterances from 97 speakers. It is recorded as 16 kHz single-channel .wav files each containing a single utterance used for controlling smart-home appliances or virtual assistant, for example, “put on the music” or “turn up the heat in the kitchen”. Each audio is labeled with three slots: action, object, and location. A slot takes on one of the multiple values: for instance, the “location” slot can take on the values “none”, “kitchen”, “bedroom”, or “washroom”. We refer to the combination of slot values as the intent of the utterance. For each intent, there are multiple possible wordings: for example, the intent {action: “activate”, object: “lights”, location: “none”} can be expressed as “turn on the lights”, “switch the lights on”, “lights on”, etc. The dataset has a total of 248 phrasing mapping to 31 unique intents.
20
 
21
+ # End-to-end SLU model for Fluent Speech Commands
22
+ Attention-based RNN sequence-to-sequence model for the [Fluent Speech Commands](https://arxiv.org/pdf/1904.03670.pdf) dataset.
23
+ This model checkpoint achieves 99.6% accuracy on the test set.
 
24
  The model uses an ASR model trained on LibriSpeech ([`speechbrain/asr-crdnn-rnnlm-librispeech`](https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech)) to extract features from the input audio, then maps these features to an intent and slot labels using a beam search.
25
 
 
26
 
27
+ You can try the model on the `example_fsc.wav` file included here as follows:
28
  ```
29
+ >>> from speechbrain.pretrained import EndToEndSLU
30
+ >>> slu = EndToEndSLU.from_hparams("/network/tmp1/ravanelm/slu-direct-fluent-speech-commands-librispeech-asr")
31
+ >>> # Text: "Please, turn on the light of the bedroom"
32
+ >>> slu.decode_file("/network/tmp1/ravanelm/slu-direct-fluent-speech-commands-librispeech-asr/example_fsc.wav")
33
+ '{"action:" "activate"| "object": "lights"| "location": "bedroom"}'
34
  ```
35
  ### Inference on GPU
36
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
37
 
38
  ### Training
39
+ The model was trained with SpeechBrain (f1f421b3).
40
  To train it from scratch follows these steps:
41
  1. Clone SpeechBrain:
42
  ```bash
 
51
 
52
  3. Run Training:
53
  ```
54
+ cd recipes/fluent-speech-commands
55
  python train.py hparams/train.yaml --data_folder=your_data_folder
56
  ```
57
 
 
69
  year = {2021},
70
  publisher = {GitHub},
71
  journal = {GitHub repository},
72
+ howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
73
  }
74
  ```
75
 
76
+ #### Referencing Fluent Speech Commands
77
 
78
  ```
79
+ @inproceedings{fluent,
80
+ author = {Loren Lugosch and
81
+ Mirco Ravanelli and
82
+ Patrick Ignoto and
83
+ Vikrant Singh Tomar and
84
+ Yoshua Bengio},
85
+ editor = {Gernot Kubin and
86
+ Zdravko Kacic},
87
+ title = {Speech Model Pre-Training for End-to-End Spoken Language Understanding},
88
+ booktitle = {Proc. of Interspeech},
89
+ pages = {814--818},
90
+ year = {2019},
91
  }
92
  ```
93