Transcribe audio longer than 30 seconds

#13

by xyang16 - opened Mar 27, 2023

Mar 27, 2023

•

edited Mar 27, 2023

When I run the model using the whisper model on an audio around 2 minutes, the output is truncated without the <|endoftext|> tag.

processor = WhisperProcessor.from_pretrained("openai/whisper-base")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")

Output:

<|startoftranscript|> <|en|> <|transcribe|> null Well , she was very short . She was about five foot tall . So she always had this rather null null bou ff ant hairstyle , and you see , to give her a few extra inches , and very , very high heels , null null which she wore even first thing on a Sunday morning . And a terrifying mean , I think . null null I say all these things about her because as a the youngest child by some years after my older null null siblings , I was always kind of an observer of this , and a slightly am

Is there any way to transcribe the whole audio longer than 30 seconds?

xyang16 changed discussion title from Transcribe audio longer than 30 second to Transcribe audio longer than 30 seconds Mar 27, 2023

sanchit-gandhi

Apr 4, 2023

Yes @xyang16 ! See https://huggingface.co/openai/whisper-base#long-form-transcription

v-zhidu

May 24, 2023

Hi, How to use Pipeline to process long audio with non-English language ?

sanchit-gandhi

May 24, 2023

•

edited May 24, 2023

You should be able to do the following for Hindi:

import torch
from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
  "automatic-speech-recognition",
  model="openai/whisper-base",
  chunk_length_s=30,
  device=device,
)

ds = load_dataset("common_voice", "hi", split="validation", streaming=True)
sample = next(iter(ds))["audio"]

prediction = pipe(sample.copy(), batch_size=8, generate_kwargs={"language": "hi", "task": "transcribe"})["text"]

# we can also return timestamps for the predictions
prediction = pipe(sample.copy(), batch_size=8, generate_kwargs={"language": "hi", "task": "transcribe"}, return_timestamps=True)["chunks"]

You can change the language and task arguments as required.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment