How can i seperate output words between chunks?

#77
by ericchen - opened

First the length of my test wave file is about 2 min, then i choose pipeline to do recognition.
Second i finetune the whisper model with my own speech data without punctuations, then the output of my whisper has no punctuation.
Then i use the pipeline method to run speech recognition, but i found between two sequential chunks, their text are concatenating without any blanks between them, how should i solve this problem?

examples:
between chunks, output recognition
" .... iknow ..."
it should be " .... i know ..."

1a91d7e4952bda1d5a6da8131bd2d98.png

Are there any suggestions? Many thanks!!!

The code i use:

>>> import torch
>>> from transformers import pipeline
>>> from datasets import load_dataset

>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> pipe = pipeline(
>>>   "automatic-speech-recognition",
>>>   model="openai/whisper-large-v2",
>>>   chunk_length_s=30,
>>>   device=device,
>>> )

>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> sample = ds[0]["audio"]

>>> prediction = pipe(sample.copy(), batch_size=8)["text"]

Hey @ericchen - what I would recommend is sweeping over different values of chunk_length_s in the interval [10, 30] (i.e. try 10, 15, ..., 30). You need to set the chunk_length_s to match the distribution of audio data your model was trained on. If you trained on audio segments typically shorter than 30s, then your chunk length should be reduced. Sweeping over different chunk lengths is the easiest way of figuring out what chunk length is appropriate.

Hey @ericchen - what I would recommend is sweeping over different values of chunk_length_s in the interval [10, 30] (i.e. try 10, 15, ..., 30). You need to set the chunk_length_s to match the distribution of audio data your model was trained on. If you trained on audio segments typically shorter than 30s, then your chunk length should be reduced. Sweeping over different chunk lengths is the easiest way of figuring out what chunk length is appropriate.

Thanks for your reply. I tried '10\15\20\25\30', the concatenated words also exist.
I thought it was a format error between chunks: "for the next chunk , you forget to put a blank at the begining of the chunk."
Then after seen your reply, you mean it is not?

Sign up or log in to comment