Language Problem

#20

by canu - opened Feb 1, 2023

canu

Feb 1, 2023

•

edited Feb 1, 2023

Hello everyone, we have started using whisper API and curretnly testing it for various languages. We have experienced a problem with the returned data.

When we speak English, French or Spanish the returned data comes in English, French or Spanish as it should. But we we speak Turkish, the returned data is in English. This is only happening when we send a request to API.

However, when we try this in local we are not experiencing this problem. Unfortunately we couldn't find something relatable in the documentation for this problem.

Anyone have an an idea what is the cuase of this?

sanchit-gandhi

Feb 2, 2023

Hey @canu ! I've had a quick scan through this blog post: https://www.philschmid.de/whisper-inference-endpoints

It mentions the following:

Note: By default, Inference Endpoint will use “English” as the language for transcription, if you want to use Whisper for non-English speech recognition you would need to create a custom handler and adjust decoder prompt.

Would adding a custom handler and changing the decoder prompt ids work here?

canu

Feb 2, 2023

Thank you very much @sanchit-gandhi ...

The link was very helpful as a guideline. We are actullay planning to run this locally with decoders. Yes it would work that way definetely. Although we may not be using inference endpoints.

sanchit-gandhi

Feb 10, 2023

Awesome, glad to hear it! Let me know if you experience any difficulties - happy to help here!

bonen

Feb 25, 2023

@sanchit-gandhi I have the same question. I am new to Hugging face, pipelines and handler so I have no idea how to create a custom handler for this Whisper Inference point and where to adjust the decoder prompt. Could you point me in the right direction? Can I make language a variable that I add when calling my endpoint or will I be only able to hardcode the language?

Thanks a lot for your help!

sanchit-gandhi

Mar 3, 2023

Hey @bonen ! There's a guide here that might help: https://huggingface.co/docs/inference-endpoints/guides/custom_handler

We need to set the correct forced ids in the config and generation config. The code for this looks as follows:

self.pipeline.model.config.forced_decoder_ids = self.pipeline.model.processor.get_decoder_prompt_ids(language="Spanish", task="transcribe")
self.pipeline.model.generation_config.forced_decoder_ids = self.pipeline.model.config.forced_decoder_ids  # just to be sure!

Maybe you can add this to the __init__ method?

lesliejd

Mar 13, 2023

Has anyone managed to get translation working? I am currently using the pipeline method and am finding that regardless of task="translate" and the language I define, I always get a transcript of the file, with the language spoken in the file.

My exact code is the following for a portuguese file:

In the __init__ method

self.pipe = pipeline(
            task="automatic-speech-recognition",
            model=path,
            chunk_length_s=30,
            device=device,
        )

        self.pipe.model.config.forced_decoder_ids = self.pipe.tokenizer.get_decoder_prompt_ids(language="portuguese", task="translate")
        self.pipe.model.generation_config.forced_decoder_ids = self.pipe.model.config.forced_decoder_ids

then in the __call__ method

        inputs = data.pop("inputs",data)
        prediction = self.pipe(inputs, return_timestamps=True)
        return prediction

This pretty much matches the samples and code suggestion in this thread, yet I have not succeed in accomplishing actual translation. This code works to produce a transcript.

bonen

Mar 15, 2023

Thanks a lot @sanchit-gandhi & @lesliejd !

sanchit-gandhi

Mar 17, 2023

Hey @lesliejd ! Updating to the latest version should fix these issues and make this code work hitch-free:

pip install --upgrade transformers

There's also the option of passing the task/language to the pipe at inference time. If you know the language a-priori, you can pass it as follows:

pipe(audio, return_timestamps=True, generate_kwargs={"language": "french"}

Likewise, you can specify the task as translate/transcribe:

pipe(audio, return_timestamps=True, generate_kwargs={"language": "french", "task": "transcribe"}

Let us know if you have any other questions, more than happy to help!

martinjurkovic

Mar 25, 2023

Hi! Thanks for all your responses until now.

As I am a beginner I still don't understand how I could pass the language parameter when calling the api with curl for transcription?
What I would like to do is get the language in the __call__ method and pass it to:
pipe(audio, return_timestamps=True, generate_kwargs={"language": language, "task": "transcribe"}

sanchit-gandhi

Apr 4, 2023

Hey @martinjurkovic ! You would need to create a custom handler for this, see https://huggingface.co/openai/whisper-large-v2/discussions/20#63db8b19ef6ecf800eca6611

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment