Input Length

by yiyuliu - opened Nov 16, 2023

Nov 16, 2023

Hi New to this model. I'm trying to do a sentiment analysis on a japanese text. I'm getting the following error:
Input is too long, try to truncate or use a paramater to handle this: The size of tensor a (534) must match the size of tensor b (512) at non-singleton dimension 1

Is there a way to increase the length of the input temporarily through parameters

lxyuan

Owner Nov 16, 2023

Hi,

We can't temporarily increase the model's sequence length (i.e., the max length of distilbert model is 512).

The easiest solution is to truncate longer sequences. Here's a code snippet that demonstrates this approach for sentiment analysis:

fn_kwargs={"padding": "max_length", "truncation": True, "max_length": 512}

distilled_student_sentiment_classifier = pipeline(
    model="lxyuan/distilbert-base-multilingual-cased-sentiments-student", 
    return_all_scores=True
)

output = distilled_student_sentiment_classifier(jpn_article, **fn_kwargs)

I haven't had the chance to run this code yet, so please let me know if you encounter any issues or errors while executing it.

yiyuliu

Nov 18, 2023

I'm running it through response requests. Is there a way to add it to the headers of the request?

DevNand

Nov 19, 2023

sir when i checked the api using post man it shows:
{
"error": "You need to specify either text or text_target.",
"warnings": [
"There was an inference error: You need to specify either text or text_target."
]
}

am i not suppose to give the input in json format?

lxyuan

Owner Nov 20, 2023

I'm running it through response requests. Is there a way to add it to the headers of the request?

Could you please share your code with me? It would make it easier to assist with debugging

lxyuan

Owner Nov 20, 2023

sir when i checked the api using post man it shows:
{
"error": "You need to specify either text or text_target.",
"warnings": [
"There was an inference error: You need to specify either text or text_target."
]
}

am i not suppose to give the input in json format?

Could you please share your code with me? It would make it easier to assist with debugging

yiyuliu

Nov 22, 2023

•

edited Nov 22, 2023

I'm running it through response requests. Is there a way to add it to the headers of the request?

Could you please share your code with me? It would make it easier to assist with debugging

model = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
hf_token = "your token from env file" 

API_URL = "https://api-inference.huggingface.co/models/" + model
headers = {"Authorization": "Bearer %s" % (hf_token)}

async def analysis(session, data, index):
    default = [[{'label': 'negative', 'score': 999}, 
                {'label': 'neutral', 'score': 999}, 
                {'label': 'positive', 'score': 999}]] #replace with empty value
    payload = dict(inputs=data, options=dict(wait_for_model=True))
    async with session.post(API_URL, headers=headers, json=payload) as response:
        if response.status != 200:
            print('found an error', response)
            if response.status == 400:
                print('input length error >> ', index)
        try:
            return await response.json()
        except:
            print('broken', index)
            response = default
            return response

lxyuan

Owner Nov 24, 2023

I'm running it through response requests. Is there a way to add it to the headers of the request?

Could you please share your code with me? It would make it easier to assist with debugging

model = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
hf_token = "your token from env file" 

API_URL = "https://api-inference.huggingface.co/models/" + model
headers = {"Authorization": "Bearer %s" % (hf_token)}

async def analysis(session, data, index):
    default = [[{'label': 'negative', 'score': 999}, 
                {'label': 'neutral', 'score': 999}, 
                {'label': 'positive', 'score': 999}]] #replace with empty value
    payload = dict(inputs=data, options=dict(wait_for_model=True))
    async with session.post(API_URL, headers=headers, json=payload) as response:
        if response.status != 200:
            print('found an error', response)
            if response.status == 400:
                print('input length error >> ', index)
        try:
            return await response.json()
        except:
            print('broken', index)
            response = default
            return response

It is strange that it seems like we can't define the 'truncation' or 'max_length' parameters in the Hugging Face Inference API. One potential workaround, though it might be slower, is to preprocess the text using the Hugging Face tokenizer before passing it into the API.

Reference:

https://huggingface.co/docs/api-inference/detailed_parameters?code=python#text-classification-task

lxyuan changed discussion status to closed Dec 1, 2023

yiyuliu

Dec 15, 2023

Thanks for the suggestion. I've ended up using nltk to tokenise and remove stop words before feeding it to hugging face API

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment