Don't have same results with Whisper on HF and Whisper from Github using Python

#23
by sboudouk - opened

By using the same audios, I do not get the same transcription from Whisper large v2 on HF and the Whisper large v2 that I pulled from Github on my python script.

I think I have to mess with settings (temperature ?) but I can't see what settings are used on hugging face.

Any idea to make my local Whisper large v2 instance to match the one hosted on here on hugging face or why do I have different results ? (I'd say they are different from 5 to 10%)

Thanks.

Hey @sboudouk ! We use the default generation kwargs in HF: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.temperature

You can override these by passing them to the .generate method if you're using model/processor, or by forwarding generate_kwargs={"temperature": 1} to the pipeline if you're using the pipeline.

Do you have a code snippet for your comparison? If you could share it I'd be happy to provide some pointers as to where we can make changes!

Sure, thanks for the link , struggled to find the default generation kwargs, kind of new to HF.

Here is the snippet where I build my whisper model instance in my python code:

transcribe = model.transcribe(audio, language='fr', temperature=0.0)

So from my understanding, I need to add every correspounding kwargs as a parameter to transcribe just as I added the language and the temperature ?

Hey @sboudouk ! Yep, you can just pass generate_kwargs as required:

import torch
from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
  "automatic-speech-recognition",
  model="openai/whisper-large-v2",
  chunk_length_s=30,
  device=device,
)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]

generate_kwargs = {"language": "<|fr|>", "temperature": 0.0}  # add any other generate kwargs you require 
prediction = pipe(sample, generate_kwargs=generate_kwargs )["text"]

Can I use generate_kwargs in HF API?

Which API @Superchik ? For the pipeline, you can use the structure defined above. For feature_extractor + model, you can simply set language="fr" when you call model.generate:

model.generate(input_features, language="fr", task="transcribe")

See the following doc for more details: https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperForConditionalGeneration.forward.example

import requests

API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v2"
headers = {"Authorization": "Bearer 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"}

def query(filename):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.post(API_URL, headers=headers, data=data)
    return response.json()

output = query("sample1.flac")

Sign up or log in to comment