Input Length

#3
by yiyuliu - opened

Hi New to this model. I'm trying to do a sentiment analysis on a japanese text. I'm getting the following error:
Input is too long, try to truncate or use a paramater to handle this: The size of tensor a (534) must match the size of tensor b (512) at non-singleton dimension 1

Is there a way to increase the length of the input temporarily through parameters

Hi,

We can't temporarily increase the model's sequence length (i.e., the max length of distilbert model is 512).

The easiest solution is to truncate longer sequences. Here's a code snippet that demonstrates this approach for sentiment analysis:

fn_kwargs={"padding": "max_length", "truncation": True, "max_length": 512}

distilled_student_sentiment_classifier = pipeline(
    model="lxyuan/distilbert-base-multilingual-cased-sentiments-student", 
    return_all_scores=True
)

output = distilled_student_sentiment_classifier(jpn_article, **fn_kwargs)

I haven't had the chance to run this code yet, so please let me know if you encounter any issues or errors while executing it.

I'm running it through response requests. Is there a way to add it to the headers of the request?

sir when i checked the api using post man it shows:
{
"error": "You need to specify either text or text_target.",
"warnings": [
"There was an inference error: You need to specify either text or text_target."
]
}

am i not suppose to give the input in json format?

I'm running it through response requests. Is there a way to add it to the headers of the request?

Could you please share your code with me? It would make it easier to assist with debugging

sir when i checked the api using post man it shows:
{
"error": "You need to specify either text or text_target.",
"warnings": [
"There was an inference error: You need to specify either text or text_target."
]
}

am i not suppose to give the input in json format?

Could you please share your code with me? It would make it easier to assist with debugging

I'm running it through response requests. Is there a way to add it to the headers of the request?

Could you please share your code with me? It would make it easier to assist with debugging

model = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
hf_token = "your token from env file" 

API_URL = "https://api-inference.huggingface.co/models/" + model
headers = {"Authorization": "Bearer %s" % (hf_token)}

async def analysis(session, data, index):
    default = [[{'label': 'negative', 'score': 999}, 
                {'label': 'neutral', 'score': 999}, 
                {'label': 'positive', 'score': 999}]] #replace with empty value
    payload = dict(inputs=data, options=dict(wait_for_model=True))
    async with session.post(API_URL, headers=headers, json=payload) as response:
        if response.status != 200:
            print('found an error', response)
            if response.status == 400:
                print('input length error >> ', index)
        try:
            return await response.json()
        except:
            print('broken', index)
            response = default
            return response

I'm running it through response requests. Is there a way to add it to the headers of the request?

Could you please share your code with me? It would make it easier to assist with debugging

model = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
hf_token = "your token from env file" 

API_URL = "https://api-inference.huggingface.co/models/" + model
headers = {"Authorization": "Bearer %s" % (hf_token)}

async def analysis(session, data, index):
    default = [[{'label': 'negative', 'score': 999}, 
                {'label': 'neutral', 'score': 999}, 
                {'label': 'positive', 'score': 999}]] #replace with empty value
    payload = dict(inputs=data, options=dict(wait_for_model=True))
    async with session.post(API_URL, headers=headers, json=payload) as response:
        if response.status != 200:
            print('found an error', response)
            if response.status == 400:
                print('input length error >> ', index)
        try:
            return await response.json()
        except:
            print('broken', index)
            response = default
            return response

It is strange that it seems like we can't define the 'truncation' or 'max_length' parameters in the Hugging Face Inference API. One potential workaround, though it might be slower, is to preprocess the text using the Hugging Face tokenizer before passing it into the API.

Reference:

lxyuan changed discussion status to closed

Thanks for the suggestion. I've ended up using nltk to tokenise and remove stop words before feeding it to hugging face API

Sign up or log in to comment