Even when the speakers starts talking after 10 sec, Whisper make the first timestamp to start at sec 0. How could I change that?

#77
by romain130492 - opened

Hello

I'm using Whisper,
when having a video with a speaker starting his speech at sec 10, I'm getting the first timestamp to be at sec 1. instead of sec 10.
Here is my config:

Config
POST v1/audio/transcriptions

{ 
 model:"whisper-1"
 file:"...mp3"
 response_format:"srt",
 prompt:"Hello, welcome to my lecture"
}

Output:

1
00:00:01,000 --> 00:00:14,000
Why are there both successful and struggling entrepreneurs? 

2
00:00:15,000 --> 00:00:23,000
Many customers prefer to watch videos to enjoy online content.

3
00:00:24,000 --> 00:00:32,000
an other sentences.


  • I believe 1 it should be 00:00:10,000 --> 00:00:14,000, since there is no one talking at all for 10 sec.
  • Also, the 3, the speakers starts again talking at sec 28, but I'm getting the timestamp to be at sec 24. The silence is simply included in the timestamp with Whisper

Any idea how I could fix that, maybe using a prompt?

Thanks!

Sign up or log in to comment