Don't have same results with Whisper on HF and Whisper from Github using Python

#23

by sboudouk - opened Feb 23, 2023

Feb 23, 2023

By using the same audios, I do not get the same transcription from Whisper large v2 on HF and the Whisper large v2 that I pulled from Github on my python script.

I think I have to mess with settings (temperature ?) but I can't see what settings are used on hugging face.

Any idea to make my local Whisper large v2 instance to match the one hosted on here on hugging face or why do I have different results ? (I'd say they are different from 5 to 10%)

Thanks.

sanchit-gandhi

Feb 23, 2023

Hey @sboudouk ! We use the default generation kwargs in HF: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.temperature

You can override these by passing them to the .generate method if you're using model/processor, or by forwarding generate_kwargs={"temperature": 1} to the pipeline if you're using the pipeline.

Do you have a code snippet for your comparison? If you could share it I'd be happy to provide some pointers as to where we can make changes!

sboudouk

Feb 23, 2023

Sure, thanks for the link , struggled to find the default generation kwargs, kind of new to HF.

Here is the snippet where I build my whisper model instance in my python code:

transcribe = model.transcribe(audio, language='fr', temperature=0.0)

So from my understanding, I need to add every correspounding kwargs as a parameter to transcribe just as I added the language and the temperature ?

sanchit-gandhi

Mar 3, 2023

•

edited Mar 3, 2023

Hey @sboudouk ! Yep, you can just pass generate_kwargs as required:

import torch
from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
  "automatic-speech-recognition",
  model="openai/whisper-large-v2",
  chunk_length_s=30,
  device=device,
)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]

generate_kwargs = {"language": "<|fr|>", "temperature": 0.0}  # add any other generate kwargs you require 
prediction = pipe(sample, generate_kwargs=generate_kwargs )["text"]

Superchik

May 30, 2023

•

edited May 30, 2023

Can I use generate_kwargs in HF API?

sanchit-gandhi

May 31, 2023

Which API @Superchik ? For the pipeline, you can use the structure defined above. For feature_extractor + model, you can simply set language="fr" when you call model.generate:

model.generate(input_features, language="fr", task="transcribe")

See the following doc for more details: https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperForConditionalGeneration.forward.example

Superchik

May 31, 2023

•

edited May 31, 2023

import requests

API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v2"
headers = {"Authorization": "Bearer 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"}

def query(filename):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.post(API_URL, headers=headers, data=data)
    return response.json()

output = query("sample1.flac")

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment