ASR Inference

by SemanticLeopard - opened Jul 20, 2022

Jul 20, 2022

•

edited Jul 20, 2022

Hi,

I've managed to fine-tune the ASR model, the last step in this recipe, however I'm struggling to understand how I can easily use an inference class to transcribe enhanced audio files with the model. It seems like the pretrained ASR model may use a encoder decoder interface, but the modules produced in the hyperparameters final output are different from what is expected by that interface.

Clearly based on the 'test_stats' a model is produced that can perform ASR, but what interface to use for inference is a bit unclear - whether this needs to be something custom, or if it's simpler than that. If you could provide some clarity regarding this, that would be helpful.

Thanks.

pplantinga

SpeechBrain org Jul 21, 2022

Hi, thanks for your interest!

Unfortunately, we haven't gotten around to writing an inference class for the robust-ASR model. You can do it yourself, however with code similar to this (untested):

noisy_wavs, wav_lens = batch.noisy_sig
cleaned_wavs, _ = self.modules.enhance_model(noisy_wavs)
asr_feats = self.hparams.fbank(cleaned_wavs)
asr_feats = self.hparams.normalizer(asr_feats, wav_lens)
embed = self.modules.src_embedding(asr_feats)
hypotheses, _ = self.hparams.beam_searcher(embed.detach(), wav_lens)
pred_words = [self.token_encoder.decode_ids(token_seq) for token_seq in hypotheses]

Hope this helps!

pplantinga

SpeechBrain org Jul 21, 2022

•

edited Jul 21, 2022

I suppose it might also work to use EncoderDecoderASR inference class with a custom YAML file. Try to copy the https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech/blob/main/hyperparams.yaml

And add the enhance model in 2 places:

+ enhance_model: # ... copy from the file in this repo
  ...
  # We compose the inference (encoder) pipeline.
  encoder: !new:speechbrain.nnet.containers.LengthsCapableSequential
      input_shape: [null, null]
+     enhance_features: !ref <enhance_model>
      compute_features: !ref <compute_features>
      normalize: !ref <normalizer>
      model: !ref <enc>

SemanticLeopard

Jul 22, 2022

•

edited Jul 22, 2022

First bit of code worked without issues. Thanks for that one!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment