timestamp decoding

#9
by StephennFernandes - opened

Hi there, is there a way to let mms have timestamp decoding similar to openai whisper models ?

Yep, easiest done with the pipeline. For character level timestamps:

from transformers import pipeline

transcriber = pipeline(model="facebook/mms-1b-all")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", return_timestamps="char")

For word-level timestamps:

from transformers import pipeline

transcriber = pipeline(model="facebook/mms-1b-all")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", return_timestamps="word")

See docs for more details: https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline

hey @sanchit-gandhi thanks a ton for taking the time to reply.

could you please tell me how i could do this in the regular inference mode as well

eg:

inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)

Sign up or log in to comment