Mirco commited on
Commit
1a21ed1
1 Parent(s): 4974c0f

upload model

Browse files
Files changed (6) hide show
  1. .gitattributes +2 -0
  2. README.md +91 -0
  3. example_fsc.wav +0 -0
  4. hyperparams.yaml +90 -0
  5. model.ckpt +3 -0
  6. tokenizer.ckpt +3 -0
.gitattributes CHANGED
@@ -14,3 +14,5 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ model.ckpt filter=lfs diff=lfs merge=lfs -text
18
+ tokenizer.ckpt filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail:
4
+ tags:
5
+ - Spoken language understanding
6
+ license: "CC0"
7
+ datasets:
8
+ - Timers and Such
9
+ metrics:
10
+ - Accuracy
11
+
12
+ ---
13
+
14
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
15
+ <br/><br/>
16
+
17
+
18
+ # End-to-end SLU model for Timers and Such
19
+
20
+ Attention-based RNN sequence-to-sequence model for [Timers and Such](https://arxiv.org/abs/2104.01604) trained on the `train-real` subset. This model checkpoint achieves 86.7% accuracy on `test-real`.
21
+
22
+ The model uses an ASR model trained on LibriSpeech ([`speechbrain/asr-crdnn-rnnlm-librispeech`](https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech)) to extract features from the input audio, then maps these features to an intent and slot labels using a beam search.
23
+
24
+ The dataset has four intents: `SetTimer`, `SetAlarm`, `SimpleMath`, and `UnitConversion`. Try testing the model by saying something like "set a timer for 5 minutes" or "what's 32 degrees Celsius in Fahrenheit?"
25
+
26
+ You can try the model on the `math.wav` file included here as follows:
27
+ ```
28
+ from speechbrain.pretrained import EndToEndSLU
29
+ slu = EndToEndSLU.from_hparams("speechbrain/slu-timers-and-such-direct-librispeech-asr")
30
+ slu.decode_file("speechbrain/slu-timers-and-such-direct-librispeech-asr/math.wav")
31
+ ```
32
+ ### Inference on GPU
33
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
34
+
35
+ ### Training
36
+ The model was trained with SpeechBrain (d254489a).
37
+ To train it from scratch follows these steps:
38
+ 1. Clone SpeechBrain:
39
+ ```bash
40
+ git clone https://github.com/speechbrain/speechbrain/
41
+ ```
42
+ 2. Install it:
43
+ ```
44
+ cd speechbrain
45
+ pip install -r requirements.txt
46
+ pip install -e .
47
+ ```
48
+
49
+ 3. Run Training:
50
+ ```
51
+ cd recipes/timers-and-such/direct
52
+ python train.py hparams/train.yaml --data_folder=your_data_folder
53
+ ```
54
+
55
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/18c2anEv8hx-ZjmEN8AdUA8AZziYIidON?usp=sharing).
56
+
57
+ ### Limitations
58
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
59
+
60
+ #### Referencing SpeechBrain
61
+
62
+ ```
63
+ @misc{SB2021,
64
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
65
+ title = {SpeechBrain},
66
+ year = {2021},
67
+ publisher = {GitHub},
68
+ journal = {GitHub repository},
69
+ howpublished = {\url{https://github.com/speechbrain/speechbrain}},
70
+ }
71
+ ```
72
+
73
+ #### Referencing Timers and Such
74
+
75
+ ```
76
+ @misc{lugosch2021timers,
77
+ title={Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers},
78
+ author={Lugosch, Loren and Papreja, Piyush and Ravanelli, Mirco and Heba, Abdelwahab and Parcollet, Titouan},
79
+ year={2021},
80
+ eprint={2104.01604},
81
+ archivePrefix={arXiv},
82
+ primaryClass={cs.CL}
83
+ }
84
+ ```
85
+
86
+ #### About SpeechBrain
87
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
88
+
89
+ Website: https://speechbrain.github.io/
90
+
91
+ GitHub: https://github.com/speechbrain/speechbrain
example_fsc.wav ADDED
Binary file (103 kB). View file
hyperparams.yaml ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ############################################################################
2
+ # Model: Direct SLU
3
+ # Encoder: Pre-trained ASR encoder -> LSTM
4
+ # Decoder: GRU + beamsearch
5
+ # Tokens: BPE with unigram
6
+ # losses: NLL
7
+ # Training: Fluent Speech Commands
8
+ # Authors: Loren Lugosch, Mirco Ravanelli 2020
9
+ # ############################################################################
10
+
11
+ # Model parameters
12
+ sample_rate: 16000
13
+ emb_size: 128
14
+ dec_neurons: 512
15
+ output_neurons: 51 # index(eos/bos) = 0
16
+ ASR_encoder_dim: 512
17
+ encoder_dim: 256
18
+
19
+ # Decoding parameters
20
+ bos_index: 0
21
+ eos_index: 0
22
+ min_decode_ratio: 0.0
23
+ max_decode_ratio: 10.0
24
+ slu_beam_size: 80
25
+ eos_threshold: 1.5
26
+ temperature: 1.25
27
+
28
+ # Models
29
+ asr_model_source: speechbrain/asr-crdnn-rnnlm-librispeech
30
+
31
+ slu_enc: !new:speechbrain.nnet.containers.Sequential
32
+ input_shape: [null, null, !ref <ASR_encoder_dim>]
33
+ lstm: !new:speechbrain.nnet.RNN.LSTM
34
+ input_size: !ref <ASR_encoder_dim>
35
+ bidirectional: True
36
+ hidden_size: !ref <encoder_dim>
37
+ num_layers: 2
38
+ linear: !new:speechbrain.nnet.linear.Linear
39
+ input_size: !ref <encoder_dim> * 2
40
+ n_neurons: !ref <encoder_dim>
41
+
42
+ output_emb: !new:speechbrain.nnet.embedding.Embedding
43
+ num_embeddings: !ref <output_neurons>
44
+ embedding_dim: !ref <emb_size>
45
+
46
+ dec: !new:speechbrain.nnet.RNN.AttentionalRNNDecoder
47
+ enc_dim: !ref <encoder_dim>
48
+ input_size: !ref <emb_size>
49
+ rnn_type: gru
50
+ attn_type: keyvalue
51
+ hidden_size: !ref <dec_neurons>
52
+ attn_dim: 512
53
+ num_layers: 3
54
+ scaling: 1.0
55
+ dropout: 0.0
56
+
57
+ seq_lin: !new:speechbrain.nnet.linear.Linear
58
+ input_size: !ref <dec_neurons>
59
+ n_neurons: !ref <output_neurons>
60
+
61
+ model: !new:torch.nn.ModuleList
62
+ - [!ref <slu_enc>, !ref <output_emb>,
63
+ !ref <dec>, !ref <seq_lin>]
64
+
65
+ tokenizer: !new:sentencepiece.SentencePieceProcessor
66
+
67
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
68
+ loadables:
69
+ model: !ref <model>
70
+ tokenizer: !ref <tokenizer>
71
+
72
+ beam_searcher: !new:speechbrain.decoders.S2SRNNBeamSearcher
73
+ embedding: !ref <output_emb>
74
+ decoder: !ref <dec>
75
+ linear: !ref <seq_lin>
76
+ bos_index: !ref <bos_index>
77
+ eos_index: !ref <eos_index>
78
+ min_decode_ratio: !ref <min_decode_ratio>
79
+ max_decode_ratio: !ref <max_decode_ratio>
80
+ beam_size: !ref <slu_beam_size>
81
+ eos_threshold: !ref <eos_threshold>
82
+ temperature: !ref <temperature>
83
+ using_max_attn_shift: False
84
+ max_attn_shift: 30
85
+ coverage_penalty: 0.
86
+
87
+ modules:
88
+ slu_enc: !ref <slu_enc>
89
+ beam_searcher: !ref <beam_searcher>
90
+
model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15afda407aa0a967b16fde3bb221ecd51a358cde456ed3c1286f248d217c5947
3
+ size 37183449
tokenizer.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d58f99aa4cc80e1cb0eb8de46a33edaa0bcade7514cd3b662ebebc685c8ebf82
3
+ size 238249