whisper identified the wrong language
#41
by
lypspeech
- opened
When I follow the example of long-form transcription for whisper-large with Korean, the result is English. But after finetuning the whisper-large model with some Korean data, the checkpoint can output Korean. I also test other model size, but all the models output English.
I was confused about it. How should I do to output Korean with the original model?
me too
you can try this:
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
generate_kwargs={"language": "br", "task": "transcribe"},
device="cpu",
use_fast=True
)
You should read the doc about how to properly set the task for transcription
instead of translation
! As mentioned by
@atulyaatul
ArthurZ
changed discussion status to
closed