Automatic Speech Recognition
NeMo
PyTorch
English
speech
audio
Transducer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results

OOM error in long form audio transcription.

#4
by StephennFernandes - opened

Hey Team,
i used the model to transcribe long form audio of upwards of 1 hr and i faced OOM error, however for a 14 min audio file i was able to transcribe with blowing up 88% of VRAM (48GB A6000)

Is there a way for long form audio transcription by still keeping compute consistent ?

NVIDIA org

@nithinraok is the pretraining code + steps to reproduce Parakeet opensourced ? asking because i have a bunch of multilingual private speech corpus, would be great to train a multilingual version of this.

The code is available in NeMo, you can use the steps in the tutorial called Fine-tuning ASR CTC in NeMo tutorials for ASR for finetuning. https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb

Once you've gone through the tutorial, you can follow along and use the finetune script here for multi node multi GPU fine-tuning - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py

For pretraining, you can use the following tutorial https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_with_Transducers.ipynb

And follow along with the script here for large scale training - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py

smajumdar94 changed discussion status to closed

Sign up or log in to comment