Only 30 second of my audio get transcripted

#112

by Ganz00 - opened 19 days ago

Discussion

Ganz00

19 days ago

I'M USING THIS,

Fonction pour transcrire l'audio

def transcribe_whisper(audio_path):
# Charger l'audio
speech_array, sampling_rate = torchaudio.load(audio_path)

# Prétraiter les entrées audio
inputs = processor(speech_array.squeeze(), sampling_rate=sampling_rate, return_tensors="pt")

# Générer la transcription
with torch.no_grad():
    predicted_ids = model.generate(**inputs,max_length=4096)

# Décoder les ids en texte
transcription = processor.batch_decode(predicted_ids)[0]

return transcription

The audio is 90 second but only 30 second sometime 10 get transcripted

timroethig

11 days ago

Hi, use the pipeline object as described on the model card. This will automatically split your long audio in 30 sec chunks and merge them afterwards.

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment