Additional output from the model apart from the transcribed text

#14
by kirankumaram - opened

Will the flexibility of getting 'segments' be added to the final output similar to how we get it while using the whisper library directly in python. Sample code is shown in the image below.

image.png

kirankumaram changed discussion title from Additional outputs from the models apart from the transcribed text to Additional output from the model apart from the transcribed text

I don't think we're planning on adding transcriptions on a segment-wise basis. Once this PR is merged we'll have utterance level time-stamps though

Hi @sanchit-gandhi , I am new on HF so this might be a silly question but how do you employ the changes introduced in the PR? Is it done centrally so that whisper-large-v2 eventually can be deployed with them?

Hey @orangediamond ! You can get the latest changes by pip installing transformers from the main branch (see https://huggingface.co/docs/transformers/installation#install-from-source):

pip install git+https://github.com/huggingface/transformers

Otherwise, a new PyPI package version (4.26.0) should be available later this week which has the latest changes:

pip install -U transformers

If you want the changes now you're better off installing from the main branch! Otherwise keep tabs on https://github.com/huggingface/transformers/releases for the releases

Thanks for your reply @sanchit-gandhi ! I was actually not referring to local installation (sorry for being unclear) but rather to deploying the discussed changes as inference endpoints. As far as I can tell, the Whisper model that gets deployed does not contain the changes here - any pointers to that end? I might be misunderstanding how everything ties together in the Hugging Face ecosystem.

I have investigated a few things without luck:

  • Setting return_timestamps = Truewith detailed parameters when calling the inference API does not seem to be possible
  • Creating a custom model with the option predefined in generation_config.json does not seem to work for me - I don't see a way to fork a model repository, and if I manually clone it I am unable to push the pretrained models using git LFS
  • Loading and invoking the model locally is unfortunately not viable

Not sure what to try at this point. If anyone (maybe @sanchit-gandhi - sorry for the extra ping) has ideas I'm all ears🙏

Hey @orangediamond ,

Thanks for clarifying and apologies for the delayed response! Indeed, it seems like detailed parameters are not supported for the ASR pipeline. Would a custom inference handler be better suited in this case? https://huggingface.co/docs/inference-endpoints/guides/custom_handler#create-custom-inference-handler

Here, you might be able to adjust the return_timestamps arg and set it to True.

We just need to make sure that we've installed Transformers from main. To do so, we can add the following to our requirements file https://huggingface.co/docs/inference-endpoints/guides/custom_dependencies#add-custom-dependencies:

git+https://github.com/huggingface/transformers

This will install the main branch of transformers, which has the latest changes for Whisper.

Thanks for the suggestion! I do seem to be getting the same error when attempting to git push anything to HF. This is what I have done:

  1. I cloned the whisper-large-v2 with GIT_LFS_SKIP_SMUDGE=1 and used it as a template for my experiment.
λ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/openai/whisper-large-v2
  1. I created a custom handler as described in your link
from typing import Dict, List, Any
from transformers import pipeline

class EndpointHandler():
    def __init__(self, path=""):
        self.pipeline = pipeline(
            "automatic-speech-recognition",
            model="openai/whisper-large-v2",
            chunk_length_s=30,
            device=device,
        )

    def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
        inputs = data.pop("inputs", data)

        prediction = self.pipeline(inputs, return_timestamps=True)
        return prediction
  1. I created requirements.txt with git+https://github.com/huggingface/transformers as the only dependency
  2. I now have the following files in my repo:
λ ls
added_tokens.json  generation_config.json  merges.txt       preprocessor_config.json  README.md         special_tokens_map.json  tokenizer_config.json
config.json        handler.py              normalizer.json  pytorch_model.bin         requirements.txt  tf_model.h5              vocab.json
  1. I git add ., git commit -m [...] and attempt to push:
λ git push
Uploading LFS objects:   0% (0/2), 0 B | 0 B/s, done.
batch response: Repository not found
error: failed to push some refs to 'https://huggingface.co/orangediamond/whizper'

For reference I use the following remote:

λ git remote get-url --all origin
https://huggingface.co/orangediamond/whizper

I am clearly doing something wrong but it is unclear to me what exactly it is. Any ideas @sanchit-gandhi ?

Hey @orangediamond , sorry for the delayed response! The repo orangediamond/whizper exists on the Hub before you do the Git push? I can't see it there (but it might be private). Could you make sure that you're correctly pushing to the target repo?

No worries @sanchit-gandhi . It does exist (privately) but I am not sure if it does so in the capacity that is required:

image.png

I would have imagined the above to suffice as far as being able to push to the repo goes?

Hey @orangediamond ! Indeed, that should be sufficient. Maybe worth trying clone the repo explicitly to your local device, moving all the files there and then pushing from within the repo?

Maybe out of context: would a HF Space suffice for your deployment? E.g. the one here: https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2

You can click "view API" at the bottom of the page to see how to send requests etc

Sign up or log in to comment