Gastron
/

asr-crdnn-librispeech

PyTorch

English

ASR

CTC

Attention

Model card Files Files and versions Community

Aku Rouhe commited on Feb 26, 2021

Commit

fec6d06

•

1 Parent(s): 934b4ef

Change instructions

Browse files

Files changed (1) hide show

README.md +6 -92

README.md CHANGED Viewed

@@ -58,105 +58,19 @@ Please notice that we encourage you to read our tutorials and learn more about
 ### Transcribing your own audio files
 ```python
-import torch
-import torchaudio
-import speechbrain
-from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.acoustic import ASR
-asr_model = ASR()
-# Make sure your output is sampled at 16 kHz.
-audio_file='path_to_your_audio_file'
-wav, fs = torchaudio.load(audio_file)
-wav_lens = torch.tensor([1]).float()
-# Transcribe!
-words, tokens = asr_model.transcribe(wav, wav_lens)
-print(words)
 ```
 ### Obtaining encoded features
-The SpeechBrain ASR() Class provides an easy way to encode the speech signal
-without running the decoding phase. Hence, one can obtain the output of the
-CRDNN model.
-```python
-import torch
-import torchaudio
-import speechbrain
-from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.acoustic import ASR
-asr_model = ASR()
-# Make sure your output is sampled at 16 kHz.
-audio_file='path_to_your_audio_file'
-wav, fs = torchaudio.load(audio_file)
-wav_lens = torch.tensor([1]).float()
-# Transcribe!
-words, tokens = asr_model.encode(wav, wav_lens)
-print(words)
-```
-### Playing with the language model only
-Thanks to SpeechBrain lobes, it is feasible to simply instantiate the language
-model to further processing on your custom pipeline:
-```python
-import torch
-import speechbrain
-from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.lm import LM
-lm = LM()
-text = "THE CAT IS ON"
-# Next word prediction
-encoded_text = lm.tokenizer.encode_as_ids(text)
-encoded_text = torch.Tensor(encoded_text).unsqueeze(0)
-prob_out, _ = lm(encoded_text.to(lm.device))
-index = int(torch.argmax(prob_out[0,-1,:]))
-print(lm.tokenizer.decode(index))
-# Text generation
-encoded_text = torch.tensor([0, 2]) # bos token + the
-encoded_text = encoded_text.unsqueeze(0).to(lm.device)
-for i in range(19):
-  prob_out, _ = lm(encoded_text)
-  index = torch.argmax(prob_out[0,-1,:]).unsqueeze(0)
-  encoded_text = torch.cat([encoded_text, index.unsqueeze(0)], dim=1)
-encoded_text = encoded_text[0,1:].tolist()
-print(lm.tokenizer.decode(encoded_text))
-```
-### Playing with the tokenizer only
-In the same manner as for the language model, one can isntantiate the tokenizer
-only with the corresponding lobes in SpeechBrain.
-```python
-import speechbrain
-from speechbrain.lobes.pretrained.librispeech.asr_crdnn_ctc_att_rnnlm.tokenizer import tokenizer
-# HuggingFace paths to download the pretrained models
-token_file = 'tokenizer/1000_unigram.model'
-model_name = 'sb/asr-crdnn-librispeech'
-save_dir = 'model_checkpoints'
-text = "THE CAT IS ON THE TABLE"
-tokenizer = tokenizer(token_file, model_name, save_dir)
-# Tokenize!
-print(tokenizer.spm.encode(text))
-print(tokenizer.spm.encode(text, out_type='str'))
-```
 #### Referencing SpeechBrain

 ### Transcribing your own audio files
 ```python
+from speechbrain.pretrained import EncoderDecoderASR
+asr_model = EncoderDecoderASR.from_hparams(source="Gastron/asr-crdnn-librispeech")
+asr_model.transcribe_file("path_to_your_file.wav")
 ```
 ### Obtaining encoded features
+The SpeechBrain EncoderDecoderASR() class also provides an easy way to encode
+the speech signal without running the decoding phase by calling
+``EncoderDecoderASR.encode_batch()``
 #### Referencing SpeechBrain