About using whisper model as API

#25

by sanjitaa - opened Sep 14, 2023

Sep 14, 2023

I am trying to load the whisper model ( medium ) in the server using Django API and integrate in frontend. How can I do it efficiently to get quick response ( even if there are large users at a single time).

skypro1111

Sep 14, 2023

in the Django ecosystem, you can use Celery to asynchronously execute a STT task. To receive the result on the frontend through a websocket connection, for example.

sanjitaa

Sep 14, 2023

Can I use two different models in server side for translation process also using Celery? I just want to use two different models through API and also want the faster response.

skypro1111

Sep 14, 2023

you can run separate workflows with separate models on separate GPUs using env variables. in this case you can skip the loading time of the models and get a faster stt time.
You can also look at faster-whisper, whisperx or FrogBase (whisper-ui) projects on github.

sanjitaa

Sep 14, 2023

•

edited Sep 14, 2023

I want to use whisper model for transcribing and another model for translation. Now for complete transcription the data should pass from whisper model to another model and generate translated output. Both models are available in hugging face. I want to use both models for better performance as well. How to implement this through hugging face?

skypro1111

Sep 14, 2023

You can start from "deploy" menu-button in right upper corner of each model page on HF.

JHenzi

Sep 14, 2023

Sanjitaa - If you are looking to transcribe and then subsequently translate the text the second model might not have to be Whisper. In fact, if you're passing text from the first model to a second model it likely makes sense to use a T5 or another input-output based model (BERT family).

I think in your playbook you're going to then need to convert text back to speech in order to get Whisper to process it a second time for the translation. Text to speech models don't seem to perform very well at this task, giving you some outputs that don't make sense compared to what you fed the model initially.

sanjitaa

Sep 15, 2023

Yes I am using two different models. For transcribing , I am using whisper and for translation, I am using another model ( like mbart ) . How can I do it through hugging face ?

sanjitaa

Sep 19, 2023

I wanted to use two different models ( whisper for transcription and mbart for text translation). The audio is passed through whisper model and the transcribed text from the whisper model is passed through mart model and text is translated. I want to use both these models through hugging face platform . How can I achieve good performance on it ? I want to display output in frontend also.

sanjitaa

Sep 20, 2023

Can anyone help me with it ?

skypro1111

Sep 20, 2023

https://platform.openai.com/docs/guides/speech-to-text

sanchit-gandhi

Sep 28, 2023

This looks very similar to this guide, except the second model is text->text, instead of text->speech.

sanjitaa

Oct 10, 2023

@sanchit-gandhi Can I use the API provided by the hugging face space ( after I deploy my model ) in my project so that it can be consumed by frontend as well ?

sanchit-gandhi

Oct 10, 2023

Yes you should be able to use the Gradio client this way! You can pass the input string as the path to your audio file. The client will send the audio to the Space, transcribe the audio, and return to you the text output. Let me know if you encounter any difficulties!

sanjitaa

Oct 11, 2023

@sanchit-gandhi It works ! Thank you .

sanjitaa

Oct 11, 2023

@sanchit-gandhi Does it work for application also ? I am using this API to build an application.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment