Facing error while inferencing

#14
by sauravm8 - opened

I am facing an inference error
The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

I am cutting off the input prompt way before 2048 tokens at around 1250 words (~1500 tokens)

`final_message = f"<|prompter|>What do you think of the following and keep it under 100 words: \n {received_message}<|endoftext|><|assistant|>"
inputs = tokenizer(final_message, return_tensors="pt").to(model.device)
print(f"Number of tokens --> {inputs['input_ids'].shape[1]}")

tokens = model.generate(**inputs,  max_new_tokens=1000, do_sample=True, temperature=0.8)
response = tokenizer.decode(tokens[0]).split("<|assistant|>")[1].strip("<|endoftext|>")
print(f"Total time taken is {time.time()-start_query_time}")
print(response)`

GPU is not an issue. What is going on here?

I think I figured it out, even the new tokens getting generated have to be within the 2048 token limit. Otherwise, it crashes. Can it not have a streaming window of memory? Am I missing something?

@sauravm8 facing a similar issue.whats the solution?

Sign up or log in to comment