Transcription and Translation In the same call

#81
by saalnlp - opened

Hi,

I am trying to find out if there is anyway I can have whisper translate ( to english only ) and transcribe in the same call for the same audio.
I was using openAI APIs with async and everything was good. However, after going offline, It looks like I need to wait the double time for the same audio to get both the transcription and translation.

Here is what I am having now

pipe = pipeline(
    "automatic-speech-recognition",
    model=whisper_model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=SECOND_DEVICE
)``` 
```transcribe = pipe(mp3_audio_path, generate_kwargs={"task": "transcribe"})```
```translate = pipe(mp3_audio_path, generate_kwargs={"task": "translate"})```

The only solution I can see now is to duplicate the Model on 2 GPUs, but I am only having 1 GPU and it is already loaded with other models. 
Also, Is there any way I can return the detected language for the transcribe pipeline ?

Thank you 

You can put both requests in a batch and run batched inference.

You will have to use the model.generate method and manually pass the decoder input ids

Sign up or log in to comment