upload model

Browse files

Files changed (6) hide show

.gitattributes +2 -0
README.md +91 -0
example_fsc.wav +0 -0
hyperparams.yaml +90 -0
model.ckpt +3 -0
tokenizer.ckpt +3 -0

.gitattributes CHANGED Viewed

@@ -14,3 +14,5 @@
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text

 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
+model.ckpt filter=lfs diff=lfs merge=lfs -text
+tokenizer.ckpt filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,91 @@

+---
+language: "en"
+thumbnail:
+tags:
+- Spoken language understanding
+license: "CC0"
+datasets:
+- Timers and Such
+metrics:
+- Accuracy
+---
+<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
+<br/><br/>
+# End-to-end SLU model for Timers and Such
+Attention-based RNN sequence-to-sequence model for [Timers and Such](https://arxiv.org/abs/2104.01604) trained on the `train-real` subset. This model checkpoint achieves 86.7% accuracy on `test-real`.
+The model uses an ASR model trained on LibriSpeech ([`speechbrain/asr-crdnn-rnnlm-librispeech`](https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech)) to extract features from the input audio, then maps these features to an intent and slot labels using a beam search.
+The dataset has four intents: `SetTimer`, `SetAlarm`, `SimpleMath`, and `UnitConversion`. Try testing the model by saying something like "set a timer for 5 minutes" or "what's 32 degrees Celsius in Fahrenheit?"
+You can try the model on the `math.wav` file included here as follows:
+```
+from speechbrain.pretrained import EndToEndSLU
+slu = EndToEndSLU.from_hparams("speechbrain/slu-timers-and-such-direct-librispeech-asr")
+slu.decode_file("speechbrain/slu-timers-and-such-direct-librispeech-asr/math.wav")
+```
+### Inference on GPU
+To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+### Training
+The model was trained with SpeechBrain (d254489a).
+To train it from scratch follows these steps:
+1. Clone SpeechBrain:
+```bash
+git clone https://github.com/speechbrain/speechbrain/
+```
+2. Install it:
+```
+cd speechbrain
+pip install -r requirements.txt
+pip install -e .
+```
+3. Run Training:
+```
+cd  recipes/timers-and-such/direct
+python train.py hparams/train.yaml --data_folder=your_data_folder
+```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/18c2anEv8hx-ZjmEN8AdUA8AZziYIidON?usp=sharing).
+### Limitations
+The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+#### Referencing SpeechBrain
+```
+@misc{SB2021,
+author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
+title = {SpeechBrain},
+year = {2021},
+publisher = {GitHub},
+journal = {GitHub repository},
+howpublished = {\url{https://github.com/speechbrain/speechbrain}},
+}
+```
+#### Referencing Timers and Such
+```
+@misc{lugosch2021timers,
+      title={Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers},
+      author={Lugosch, Loren and Papreja, Piyush and Ravanelli, Mirco and Heba, Abdelwahab and Parcollet, Titouan},
+      year={2021},
+      eprint={2104.01604},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+#### About SpeechBrain
+SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
+Website: https://speechbrain.github.io/
+GitHub: https://github.com/speechbrain/speechbrain

example_fsc.wav ADDED Viewed

Binary file (103 kB). View file

hyperparams.yaml ADDED Viewed

	@@ -0,0 +1,90 @@

+# ############################################################################
+# Model: Direct SLU
+# Encoder: Pre-trained ASR encoder -> LSTM
+# Decoder: GRU + beamsearch
+# Tokens: BPE with unigram
+# losses: NLL
+# Training: Fluent Speech Commands
+# Authors:  Loren Lugosch, Mirco Ravanelli 2020
+# ############################################################################
+# Model parameters
+sample_rate: 16000
+emb_size: 128
+dec_neurons: 512
+output_neurons: 51 # index(eos/bos) = 0
+ASR_encoder_dim: 512
+encoder_dim: 256
+# Decoding parameters
+bos_index: 0
+eos_index: 0
+min_decode_ratio: 0.0
+max_decode_ratio: 10.0
+slu_beam_size: 80
+eos_threshold: 1.5
+temperature: 1.25
+# Models
+asr_model_source: speechbrain/asr-crdnn-rnnlm-librispeech
+slu_enc: !new:speechbrain.nnet.containers.Sequential
+    input_shape: [null, null, !ref <ASR_encoder_dim>]
+    lstm: !new:speechbrain.nnet.RNN.LSTM
+        input_size: !ref <ASR_encoder_dim>
+        bidirectional: True
+        hidden_size: !ref <encoder_dim>
+        num_layers: 2
+    linear: !new:speechbrain.nnet.linear.Linear
+        input_size: !ref <encoder_dim> * 2
+        n_neurons: !ref <encoder_dim>
+output_emb: !new:speechbrain.nnet.embedding.Embedding
+    num_embeddings: !ref <output_neurons>
+    embedding_dim: !ref <emb_size>
+dec: !new:speechbrain.nnet.RNN.AttentionalRNNDecoder
+    enc_dim: !ref <encoder_dim>
+    input_size: !ref <emb_size>
+    rnn_type: gru
+    attn_type: keyvalue
+    hidden_size: !ref <dec_neurons>
+    attn_dim: 512
+    num_layers: 3
+    scaling: 1.0
+    dropout: 0.0
+seq_lin: !new:speechbrain.nnet.linear.Linear
+    input_size: !ref <dec_neurons>
+    n_neurons: !ref <output_neurons>
+model: !new:torch.nn.ModuleList
+    - [!ref <slu_enc>, !ref <output_emb>,
+       !ref <dec>, !ref <seq_lin>]
+tokenizer: !new:sentencepiece.SentencePieceProcessor
+pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
+    loadables:
+        model: !ref <model>
+        tokenizer: !ref <tokenizer>
+beam_searcher: !new:speechbrain.decoders.S2SRNNBeamSearcher
+    embedding: !ref <output_emb>
+    decoder: !ref <dec>
+    linear: !ref <seq_lin>
+    bos_index: !ref <bos_index>
+    eos_index: !ref <eos_index>
+    min_decode_ratio: !ref <min_decode_ratio>
+    max_decode_ratio: !ref <max_decode_ratio>
+    beam_size: !ref <slu_beam_size>
+    eos_threshold: !ref <eos_threshold>
+    temperature: !ref <temperature>
+    using_max_attn_shift: False
+    max_attn_shift: 30
+    coverage_penalty: 0.
+modules:
+    slu_enc: !ref <slu_enc>
+    beam_searcher: !ref <beam_searcher>

model.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15afda407aa0a967b16fde3bb221ecd51a358cde456ed3c1286f248d217c5947
+size 37183449

tokenizer.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d58f99aa4cc80e1cb0eb8de46a33edaa0bcade7514cd3b662ebebc685c8ebf82
+size 238249