How to convert the file pytorch_model.bin for Whisper use?

#4
by sdugoten - opened

Hi, I would like to use the pytorch_model.bin file in Whisper, but it seems the model it used is called large-v2.pt. I wonder how do I convert this so that it can be used in Whisper installation? Sorry I am a beginner on Whisper. Thanks.

Just to give more information, I have tried whisper-cpp, the .bin file won't take it. And OpenAI whisper takes .py which also not in the same format.

If I want to use both, how do i convert pytorch_model.bin into format that whisper-cpp and OpenAI whisper to use?

There should be some information online on how to convert a HuggingFace Whisper model back into OpenAI's Whisper model format and also ggml / gguf. For now I don't have them saved, but a few spaces on HuggingFace does provide information. The #audio-4-ml channel on HuggingFace's Discord also helps a lot (that's where I got help for fine-tuning this model). I hope this helps.

There should be some information online on how to convert a HuggingFace Whisper model back into OpenAI's Whisper model format and also ggml / gguf. For now I don't have them saved, but a few spaces on HuggingFace does provide information. The #audio-4-ml channel on HuggingFace's Discord also helps a lot (that's where I got help for fine-tuning this model). I hope this helps.

just FYI, I managed to convert the model to be used by whisper faster. However, it seems this training model would not chop single long sentence into smaller segment as you see below

AWtowhKQKJ6-bLKh5uYcB.jpg

The original large-v2 do not have this problem like below. I am not sure if there is something you can tune though. Up to this moment, it can't be used with Whisper due to this issue. The subtitle will cover 2/3 of the screen...

rCXu9KTvRdBxSuucCqRfU.jpg

@sdugoten how did you convert the model to be used?

Sign up or log in to comment