Automatic Speech Recognition
NeMo
PyTorch
4 languages
automatic-speech-translation
speech
audio
Transformer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results

How to get timestamp for this model

#16
by Leo2023hwag - opened

Thanks for great work! When I tried to get timestamp using following code,

101         decoding_cfg = asr_model.cfg.decoding
102         print(OmegaConf.to_yaml(decoding_cfg))
103         with open_dict(decoding_cfg):
104             decoding_cfg.preserve_alignments = True
105             decoding_cfg.compute_timestamps = True
106             #decoding_cfg.rnnt_timestamp_type = 'word'
107         asr_model.change_decoding_strategy(decoding_cfg)
                transcriptions = asr_model.transcribe(audios, return_hypotheses=True)

I got following message printed out and transcripts returned does not contains ,

Preservation of alignments was requested but TransformerAEDBeamInfer does not implement it.
...
return_hypotheses=True is currently not supported, returning text instead.

It seems it does not support feature yet. Do you guys have plan for it?

NVIDIA org

Hi, thanks for trying out the Canary model! It currently does not support timestamps directly, though we are looking into that.

In the meantime, assuming you want timestamps for ASR transcription, you can obtain them using NeMo Forced Aligner. If you want to get the timestamps based on the transcription provided by Canary, you will need to make a new NeMo manifest file with Canary’s transcriptions saved in the text field. Then run NFA like in the quickstart command. The command in the quickstart uses the model stt_en_fastconformer_hybrid_large_pc , which is English-only. If you want the other languages, then replace en with de, es or fr. We also have other ASR models you could use for alignment, but I would start with this.

Thanks for detailed instructions, I will give it a try.

Leo2023hwag changed discussion status to closed
deleted
This comment has been hidden

Sign up or log in to comment