Loading a sample audio file...

#1
by Equilibrier - opened

Hi, in your example you are loading an audio dataset contents, but I would like to use an external file to test this fork.
How can I load a simple wav/mp3 in the python code you provided, instead of the dataset ?

Hi,

If you want to have the output, you can use the demo space ( https://huggingface.co/spaces/gigant/romanian-whisper ) in which you can use either audio files or record with your microphone. Otherwhise if you run the code by yourself, you can use torchaudio.load to load an array from a file, just make sure that you you a sample rate of 16kHz because that is the one used for training.
For instance you can resample using torchaudio like this:

import torchaudio.functional as F

def resample(sample, resample_rate = 16000):
  sample_rate = sample[1]
  resampled_waveform = F.resample(sample[0], sample_rate, resample_rate, lowpass_filter_width=512, rolloff=0.99)
  return resampled_waveform

If you are using the pipeline from transformers, you can give the filepath as is, check the code in https://huggingface.co/spaces/gigant/romanian-whisper/blob/main/app.py for example. Basically it is:

import torch
from transformers import pipeline

device = 0 if torch.cuda.is_available() else "cpu"

MODEL_NAME = "gigant/whisper-medium-romanian"
lang = "ro"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)

pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")

text = pipe(file)["text"] #with "file"  being the path to your audio file

Hope this helps

gigant changed discussion status to closed

Sign up or log in to comment