Only 30 second of my audio get transcripted

#112
by Ganz00 - opened

I'M USING THIS,

Fonction pour transcrire l'audio

def transcribe_whisper(audio_path):
# Charger l'audio
speech_array, sampling_rate = torchaudio.load(audio_path)

# Prétraiter les entrées audio
inputs = processor(speech_array.squeeze(), sampling_rate=sampling_rate, return_tensors="pt")

# Générer la transcription
with torch.no_grad():
    predicted_ids = model.generate(**inputs,max_length=4096)

# Décoder les ids en texte
transcription = processor.batch_decode(predicted_ids)[0]

return transcription

The audio is 90 second but only 30 second sometime 10 get transcripted

Hi, use the pipeline object as described on the model card. This will automatically split your long audio in 30 sec chunks and merge them afterwards.

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Sign up or log in to comment