UserWarning on`max_length`when deployed at Inference Endpoints

#20
by seeknndestroy - opened

Hey there!

I've recently deployed the distil-whisper/distil-large-v2 model for ASR on Hugging Face's Inference Endpoints, using an AWS instance with a GPU Nvidia Tesla T4. However, I'm encountering a UserWarning related to max_length, and it's got us a bit puzzled.

Here's the code snippet I'm using from examples given at call examples:

import requests

API_URL = "https://ovibb90ga7zdc5qa.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
    "Authorization": "Bearer XXXXXX",
    "Content-Type": "audio/flac"
}

def query(filename):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.post(API_URL, headers=headers, data=data)
    return response.json()

output = query("sample1.flac")

Here is the full warning message:

2023/12/13 14:22:36 ~ /opt/conda/lib/python3.9/site-packages/transformers/generation/utils.py:1369: UserWarning: Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.

I need to ensure my longer audio files get fully transcribed. Any advice or insights on how to best handle this warning and maintain my transcription quality would be much appreciated!

Thanks a bunch!

Sign up or log in to comment