language='cantonese' error

#21
by Boxp - opened

When I used openai whisper:
import whisper
transcription = model.transcribe(data_path, language='cantonese')['text']
It worked.

But when I used transformers:
predicted_ids = model.generate(input_features, language='cantonese')
It resulted in an error.

='yue'

Didn't work.
predicted_ids = model.generate(input_features, language='yue')
"ValueError: Unsupported language: yue"
transformers==4.35.0

Hey @Boxp ,

I think you need to upgrade your transformers version to current main:

pip install git+https://github.com/huggingface/transformers

Then you should be able to run the following:

...
predicted_ids = model.generate(input_features, language='cantonese')

Could you give it a try?

https://github.com/openai/whisper/blob/1cea4357687b676b293cb5473e1ade25f5b1cef7/whisper/tokenizer.py#L110
you can find yue in this source code file.
Or you should update your whisper repo to the latest version.

https://github.com/openai/whisper/blob/1cea4357687b676b293cb5473e1ade25f5b1cef7/whisper/tokenizer.py#L110
you can find yue in this source code file.
Or you should update your whisper repo to the latest version.

yes, openai whisper is fine, but the error occurred when using the transformers whisper, I solved it by upgrading transformers.

Hey @Boxp ,

I think you need to upgrade your transformers version to current main:

pip install git+https://github.com/huggingface/transformers

Then you should be able to run the following:

...
predicted_ids = model.generate(input_features, language='cantonese')

Could you give it a try?

Thanks! It works now, transformers == 4.36.0.dev0

Sign up or log in to comment