Steps for a fair comparison of distil-whisper with openai whisper?

#6
by anuragrawal - opened

Hi,

I have a transcription pipeline and I was using openai/whisper-medium.en until now. I am trying to replace it with distil-whisper/distil-medium.en (with speculative decoding) but before I do that, I want to verify the quality of transcription and processing time claims made on the model card page. I downloaded 5 videos from Youtube and ran the transcription via following 3 methods:

  1. Using openai-whisper locally
  2. Using huggingface pipeline with openai/whisper-medium.en
  3. Using distil-whisper/distil-medium.en (with speculative decoding)

A couple of things I observed:

  1. distil-medium.en didn't take less time than openai's whisper, sometimes it took longer
  2. transcription ended abruptly (in middle of a sentence) when using distil-medium.en whereas this is never the case with openai's whisper
  3. distil-medium.en (with speculative decoding) doesn't produce the exact same output as openai's whisper

Note that I ran this experiment on a single CPU for a fair comparison. The only thing that might be different between these 3 experiments is the chunk size. I used the recommended 30 secs for openai and 15 secs for distil-medium.en

Can you please let me know what I am missing?

Python scripts:

main.py

import os
import argparse
import whisper
import time
import torch
from writeToJSON import createJSON
from transformers import pipeline, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq, AutoProcessor

def openai_transcript(vid_path):
    model = whisper.load_model('medium.en', device = "cpu")
    print(f"Transcribing {vid_path} using openai whisper...")
    start_time = time.time()
    result = model.transcribe(vid_path)
    end_time = time.time()
    return result["text"], f"{end_time - start_time:.2f} seconds"
    # print(f"{vid_path} using openai whisper took => {end_time - start_time:.2f} seconds")

def hf_whisper(vid_path):
    device = "cpu"
    torch_dtype = torch.float32

    pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-medium.en",
    chunk_length_s=30,
    device=device,
    torch_dtype=torch_dtype,
    )
    print(f"Transcribing {vid_path} using hf-whisper...")
    start_time = time.time()
    result = pipe(vid_path)
    end_time = time.time()
    return result["text"], f"{end_time - start_time:.2f} seconds"

def distil_whisper_transcript(vid_path):
    device = "cpu"
    torch_dtype = torch.float32
    assistant_model_id = "distil-whisper/distil-medium.en"

    assistant_model = AutoModelForCausalLM.from_pretrained(
        assistant_model_id, torch_dtype=torch_dtype, use_safetensors=True #low_cpu_mem_usage=True,
    )
    assistant_model.to(device)

    model_id = "openai/whisper-medium.en"

    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, use_safetensors=True
    )
    model.to(device)

    processor = AutoProcessor.from_pretrained(model_id)
    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        max_new_tokens=128,
        chunk_length_s = 15,
        torch_dtype=torch_dtype,
        device=device,
        generate_kwargs={"assistant_model": assistant_model},
    )
    print(f"Transcribing {vid_path} using distil-whisper...")
    start_time = time.time()
    result = pipe(vid_path)
    end_time = time.time()
    return result["text"], f"{end_time - start_time:.2f} seconds"
    # print(f"{vid_path} using distil-whisper took => {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    # Create an argument parser
    parser = argparse.ArgumentParser(description="Transcription-summarization pipeline")

    # Define expected command-line arguments
    parser.add_argument('--vid_folder', type=str, help='Enter the video file path')
    # Parse the command-line arguments
    args = parser.parse_args()
    vid_folder = args.vid_folder

    for vid in os.listdir(vid_folder):
        vid_path = os.path.join(vid_folder, vid)
        # Transcribe using openai whisper medium.en
        transcript_openai, time_openai = openai_transcript(vid_path)
        # Transcribe using distil-whisper medium.en
        transcript_distil, time_distil = distil_whisper_transcript(vid_path)
        #Transcribe using hf-openai-whisper-medium.en
        transcript_hf, time_hf = hf_whisper(vid_path)
        createJSON(vid_path, transcript_openai, time_openai, transcript_distil, \
                   time_distil, transcript_hf, time_hf, "output.json")

writeToJSON.py


import json

def createJSON(vid_path, transcript_openai, time_openai, transcript_distil, time_distil, transcript_hf, time_hf, output_json_path):
    # Create a dictionary to hold the outputs
    new_data = {
        vid_path:{
        "Transcript_openai": transcript_openai,
        "Time_openai": time_openai,
        "Transcript_distil": transcript_distil,
        "Time_distil": time_distil,
        "Transcript_hf_whisper": transcript_hf,
        "Time_hf_whisper": time_hf}
    }

    # Read the existing JSON data (if any)
    try:
        with open(output_json_path, 'r') as file:
            existing_data = json.load(file)
    except FileNotFoundError:
        existing_data = {}

    # Update the existing data with the new data
    existing_data.update(new_data)

    # Serialize the dictionary to a JSON string
    json_string = json.dumps(existing_data, indent=4)

    # Write the JSON string to a file
    with open(output_json_path, "w") as json_file:
        json_file.write(json_string)

    print(f"Outputs have been written for {vid_path}.")

Input videos:

Thank you!

Here's the output JSON file:

{
"test_videos/Download.mp4": {
"Transcript_openai": " little better than me. I'm a trained police officer, gone through the academy, you know? This lawyer was pulled over by police and they had a pretty tense encounter. Hi there, can you turn your engine off for me please? Keep your hands on the steering wheel as well. Do you know why I pulled you over today? Well actually I don't because I wasn't speeding and my car's in perfect condition. You know when you take off from being in a standstill, if you go too fast, that's accelerating too fast which is what you've just done. Right, yeah, sorry, yes, that was my mistake. Okay, so are we okay then? Am I okay to go? Because I was barely over the speed limit. I would like your license and registration. Yeah, uh, okay, not a problem, um, fine. Thank you. Can you step out of the vehicle for me? Rolling down the window to talk to you is all I'm actually legally required to do, there's no reason for me to leave the vehicle. Are you sure about that little miss know-it-all? Because, uh, you think you know the little better than me, I'm a trained police officer, gone through the academy, you know? Okay, well, that's good, I'm a solicitor, so yeah, I do know the law. Are you sure about that, this car you're driving, you're a solicitor? Yes, I am sure about that, I've done several years of training in London, have my LPC. Is that right? Yes, I know. Is there anything else you would like to add to that list? Would there be any point? You're an assistant at best, come on, we both know it. You're just using big words and you're making up that you're a lawyer. Read up on the law, I have rolled down the window, I have spoken with you, that's all I'm legally required to do. So is this what you learned at your big fancy law firm? I learned a lot of things at my big fancy law firm. Of course, like not knowing how to accelerate quickly. I'm sorry, I apologize, I accelerated a little too swiftly, yes, but come on, I was barely over the speed limit, I think we can... So, no, no, no, I'm going to ask you again, can you get out of your vehicle, please? I'm sorry, I'm not prepared to get out of the vehicle because there's no need for me to get out of the vehicle. Don't get your neckers in a twist, have you got your MOT certificate? It's actually not in the car at the moment, I had it serviced last week and I haven't yet put the updated certificate in the glove compartment. So you're telling me again that you're a lawyer and you don't understand the rules of the law? Like you don't understand the basics of being pulled over? I do and I think you'll find the road traffic law of 1988 will show that law about giving MOT certificates not changed in 34 years. 34 years? Then you as this big fancy lawyer must know all the rules of the road, correct? Well, I do try to keep up with changes in legislation, yes. Right, you know what, I'm not happy with this tone of voice that you're giving me. I'm an officer of the law. Yes, I think we've established that. If you continue to be disrespectful towards me, I'll bring you into the station. No, you won't actually because I think you'll find that section D of that particular Act, well it protects motorists by giving them seven days in which to produce their MOT certificates. I don't have to give it to you now, if it's not in the car, I have seven days with which to find it. I do know where it is and bring it into the police station. Can we leave it there? Well, no because I've asked you to get out of your vehicle multiple times and you're refusing. That's right, I am refusing as is my right, my legal right. Would you like me to arrest you? Well, you're very welcome to arrest me. What is your big boss at your law firm going to do?",
"Time_openai": "80.36 seconds",
"Transcript_distil": " This lawyer was pulled over by police and my car's in perfect condition. So, you know when you take off from being in a standstill, if you go too fast, that's accelerating too fast, which is what you've just done. Right, yeah, sorry, yes, that was my mistake okay so are we okay then I'm okay to go because I was I was barely over the speed limit I would like your license and registration yeah okay not a problem rolling down the window to talk to you is all I'm actually legally required to do. There's no reason for me to leave the vehicle. Are you sure about that little miss know-it-all? Because you think you know the law better than me. I'm a trained police officer. Gone through the academy, you know? Okay, well, that's good. I'm a solicitor, so yeah, I do know the law. Are you sure about that? This car you're driving, you're car you're driving yes I am sure about that I've done several years of training in London I have my LPC I'm right yes I know she would like to add to that list? You're an assistant at best. Come on, we both know it. You're just using big words and you're making up that you're a lawyer. Read up on the law. I have rolled down the window, I have spoken with you. That's all I'm legally required to do. So is this what you learnt at your big fancy law firm? I learnt a lot of things at my big fancy law firm. Of course, like not knowing how to accelerate quickly. I'm sorry, I apologise, I accelerated a little too swiftly because there's no need for me to get out of the vehicle. Don't get your neckers in as well. Have you got your MOT certificate? It's actually not in the car at the moment. I had it serviced last week and I haven't yet put the updated certificate in the glove compartment. So you're telling me again that you're a lawyer and you don't understand the rules of the law. You don't understand the basics of being a pullover. I do and I think you'll find the road traffic law of 1988 will show that law about giving MOT certificates not changed in 34 years Then you as this big fancy lawyer must know all the rules of the road correctly well I do try to keep up with changes in legislation yes right you know what I'm not happy with this tone of voice you're giving them I'm an officer of the law yes I think we've established that. No, you won't actually because I think you'll find that section D of that particular Act, well it protects motorists by giving them seven days in which to produce their MOT certificates. I don't have to give it to you now. If it's not in the car, I have seven days with which to find it. I do know where it is. And bring it in to the police station. Can we leave it there? Well no because I've asked you to get out of your vehicle multiple times and you're refusing. That's right I am refusing as is my right, my legal right. Then would you like me to arrest you? Well you're very welcome to arrest me, but I..",
"Time_distil": "95.16 seconds",
"Transcript_hf_whisper": " This lawyer was pulled over by police and my car's in perfect condition. So, you know when you take off from being in a standstill, if you go too fast, that's accelerating too fast, which is what you've just done. Right, yeah, sorry, yes, that was my mistake. Okay, so, are we okay then? Am I okay to go because I was barely over the speed limit? I would like your license and registration. Yeah, I'm okay. Hurry up. Not a problem and fine rolling down the window to talk to you is all I'm actually legally required to do there's no reason for me to leave the vehicle are you sure about that little miss know at all because uh you think you know the law better than me I'm a trained police police officer. Gone through the academy, you know? Okay, well, that's good. I'm a solicitor, so yeah, I do know the law. Are you sure about that? This car you're driving, you're a solicitor? Yes, I am sure about that. I've done several years of training in London, have my LPC. Is that right? Yes, I know. Anything else you would like to add to that list? Would there be any point? You're an assistant at best. Come on, we both know it. You're just using big words and you're making up that you're a lawyer. So is this what you learnt at your big fancy law firm? I learnt a lot of things at my big fancy law firm. Oh, of course, like not knowing how to accelerate quickly. I'm sorry, I apologise, I accelerated a little too swiftly, yes, but come on, I was barely over the speed limit. I think we can... So, no, no, no, I'm going to ask you again, can you get out of your vehicle please? I'm sorry, I'm not prepared to get out of the vehicle because there's no need for me to get out of the vehicle. Don't get your neckers in as West. Have you got your MOT certificate? It's actually not in the car at the moment. I had it serviced last week and I haven't yet put the updated certificate in the glove compartment. So, you're telling me again that you're a lawyer and you don't understand the rules of the law? Like you don't understand the basics of being pulled over. I do, and I think you'll find the road traffic law of 1988 will show that law about giving MOT certificates not changed in 34 years. 34 years? Mm. Then you as this big fancy lawyer must know all the rules of the road, correct? Well, I do try to keep up with changes in legislation, yes. Yes, I think you'll find that section D of that particular act Well, it protects motorists by giving them seven days in which to produce their MOT certificates I don't have to give it to you now if it's not in the car I have seven days with which to find it. I do know where it is and bring it into the police station Can we leave it there? Well no because I've asked you to get out of your vehicle multiple times and you're refusing. That's right I am refusing as is my right. My legal right. Then would you like me to arrest you? Well you're very welcome to arrest me. What is your big boss at your law firm gonna do?",
"Time_hf_whisper": "81.00 seconds"
},
"test_videos/Download-5.mp4": {
"Transcript_openai": " Hello, Eric with the police department. Just so you know, I'm recording with the body camera. Sure. Hey I don't think so. No, okay the reason I was like, I don't know if I've told you about your car, but you look familiar Sorry, it's all good. Anyways, the reason for the stop is because your tabs are expired and oh no Yeah, they're expired barely like three months, but I wasn't sure if you were aware of it. You know what? It probably slipped my mind. Hey, no problem. Okay Yeah, every license registration I do yeah, oh that's a cool little neat I always lose my wallet Hey fair enough. I'm the same way. That's why I always have it on me at all times Yeah with the kids and everything else it probably just slipped my mind hey like I said, no worries not You know, I just wanted to make sure that you were aware Because like this one that one's expired. Yeah, that was 20 21 And then this one is This is the one that's just yeah, here's that do you have your proof of insurance? Well, you get that just go back so we can make this fast as possible so you get out here, okay Okay, Brian Yeah, here's this now. I remember where where I recognize you from what's up call bank. Oh Your car was Right. Okay. So it's the glasses man. Yeah, I wasn't worrying him the last time we know it's always like wait Why do I so I have a bit of an update for you as well? Okay, I was actually gonna call you and it's crazy convenient right? so you're the people who send over the video to me said that there was no video surveillance on that day, but The reports already done. Everything's been done. I was waiting on And I remember you told me as much too, I don't know how well you can see that this is yeah All right, but yeah, unfortunately, they didn't find anything. So Yeah, I remember the last time we talked that was essentially that and I was like But I knew I was like, I remember where did and then I saw it the back and I was like So, um, but yeah, sorry, sorry to you know inform you of that but that's okay I'll put that in all I'll put in the new tabs tonight when I get home. Awesome. Yeah. Thanks, man Oh and I checked and it seems like your registration is expired. So, I don't know if you just got it go I think I just got to go online. Yeah So I'm just gonna give a verbal warning on the registration I get it you get busy and stuff So yeah, but it's verbal warning just get that done as soon as possible. All right, okay I'm gonna see you now. I will",
"Time_openai": "54.83 seconds",
"Transcript_distil": " Hello, Eric with the police department. Just so you know, I'm recording with the body camera. Sure. Hey, have I put you over before? I don't think so. No? Okay. I was like, I don't know if I've told you about your car, but you look familiar. Sorry. It's all good. Anyways, the reason for the stop is because your tabs are expired. Oh no. Yeah, they're expired barely like three months, but I wasn't sure if you were aware of it. You know what? It probably slipped my mind. Hey, no problem. Okay. Do you have your license, registration proof? I do, yeah. Oh, that's a cool little neat trip. I always lose my wallet. Fair enough, I'm the same way. That's why I always have it on me at all times. Yeah, with the kids and everything else it probably just slipped my mind. Hey, like I said, no worries, not being in the world. You know, I just wanted to make sure that you were aware because like this this one that one's expired yeah that was 2021 and then this one is is that this is the one that just expired, here's that do you have your proof of insurance? Well you get that just go back so we can make this fast as possible so you can get out of here. Okay Brian, here's this. Now I remember where I recognize you from. Up up, Paul Bank? Oh your car was in there. That's right. Okay, so it's the glasses man So you're the people who send over the video to me said that there was no video surveillance on that day, but the report's already done, everything's been done. I was waiting on that. No, and I assumed as much. Yeah, and I remember you told me as much too. I don't know how well you can see that. But yeah, unfortunately they didn't find anything. I didn't expect them to the way the cameras were oriented. Yeah, I remember the last time we talked there was essentially that and I was like, ah, but I knew I was like, I remember where did and then I saw the back and I was like, there's something familiar so um but yeah sorry sorry to you know inform you of that but that's okay I'll put that in all I'll put in the new tabs tonight when I get home awesome Awesome. Thanks man. Oh and I checked and it seems like your registration is expired so I don't know if you just got to go. I think I just got to go online. Yeah. So I'm just going on the registrat get that done as soon as",
"Time_distil": "71.43 seconds",
"Transcript_hf_whisper": " Hello, Eric with the police department. Just so you know, I'm recording with the body camera. Sure. Hey, have I put you over before? I don't think so. No? Okay. The reason I was like, I don't know if I've told you about your car, but you look familiar. Sorry. It's all good. Anyways, the reason for the stop is because your tabs are expired. Oh no. Uh, yeah, they're expired barely like three months, but I wasn't sure if you were aware of it. know what it probably slipped my mind hey no problem okay yeah you have your license registration I do yeah oh that's a cool little neat trip I always lose my wallet hey fair enough I'm the same way that's why I always have it on me at all times yeah with the kids and everything else it probably just slipped my mind hey like I said no worries not you know I just wanted to make sure that I is that Where I recognize you from what's up, Paul Bank? Oh your car was in there. That's right. Okay, so it's the glasses man Yeah, I wasn't wearing them the last time we know so I was like wait, where do I so that makes I have a bit of an update for you as well. Okay, I was actually gonna call you and it's crazy convenient, right? so you're the people who send over the video to me said that there was no video surveillance on that day, but The reports already done. Everything's been funny thing put in the new tabs tonight when I get home. Awesome. Thanks man. Oh and I checked and it seems like your registration is expired so I don't know if you just got to go. I think I just got to go online. Yeah. So I'm just going to give you a verbal warning on the registration. I get it. You get busy and stuff. So yeah. For sure man. But verbal warning just get that done as soon as possible. Okay. I will. Have a great rest of your night. I will.",
"Time_hf_whisper": "45.86 seconds"
},
"test_videos/Download-11.mp4": {
"Transcript_openai": " I'm gonna move. It's ridiculous. Fucking hellos. Is she gonna just sit there? I hate him so much. God, I hate this world. I fucking hate him.",
"Time_openai": "4.82 seconds",
"Transcript_distil": " I'm gonna move. It's ridiculous. Is she gonna just sit there? I hate him so much. I hate this world. I hate him.",
"Time_distil": "3.97 seconds",
"Transcript_hf_whisper": " I'm gonna move. It's ridiculous. Is she gonna just sit there? I hate him so much. I hate this world. I hate him.",
"Time_hf_whisper": "4.08 seconds"
}
}

Here's the readme file in case you want to mimic these results:

  1. Create a python virtual environment and activate it

  2. Upgrade pip and install transformers
    pip install --upgrade pip
    pip install --upgrade transformers

  3. Install openai-whisper
    pip install -U openai-whisper

  4. cd to directory that has python scripts

  5. Run the python script from command line using:
    python main.py --vid_folder "enter video folder path here"

Whisper Distillation org

Hi there! So, I think this is due to the following two things you mention:

  1. CPU inference

    I ran this experiment on a single CPU for a fair comparison.

  2. More batches due to smaller sliding window

    I used the recommended 30 secs for openai and 15 secs for distil-medium.en

When running these models on CPU, the encoding phase is typically the biggest bottleneck (on the other hand, using a GPU would be able to parallelize these operations). So, using a smaller sliding window results in more batches, and so, the encoding phase is run more times.

If you have access to a GPU, I would highly recommend using it for these models. See @reach-vb 's analysis and benchmarks here. If this is not a possibility, could you try increase the sliding window to 20, 25, or 30 seconds? I know the README suggests using 15 seconds, but @sanchit-gandhi has mentioned you can try increase the window for the medium model.

Thank you @Xenova ! I'll try your recommendations and shall post my findings here shortly.

Whisper Distillation org

You can read a full description of our benchmark set-up in Appendix B2 of the paper (page 23): https://arxiv.org/abs/2311.00430

As @Xenova has rightly pointed out, we ran the benchmarks on GPU hardware (A100 and T4) with bs={1, 4, 16}. Full results are shown on page 30.

Sign up or log in to comment