AssertionError: Total sequence length exceeds cache size in model.forward

#1
by Hardcore7651 - opened

I'm getting this error when running past 2k context despite having the modeled loaded for 32k on runpod on an a6000.

I belive it is related to this: https://github.com/oobabooga/text-generation-webui/issues/5750#issuecomment-2024442282

But I am not knowledgable enough to be sure.

I use text-generation-webui from May 19 and do not have this issue. I use 4bit cache. What are your settings and what version do you use?

BTW, I made a small update in config.json and tokenizer_config.json - I believe it is unrelated to your problem, but please update those files.

Max length is at 32k. Alpha value is at 1, compress_pos_emb at 1. I have tried both 8 and 4 bit cache and neither worked. I can get successful generations up to about 2k then it will simply fail. Also on textgen webui.

This is my pod template: text-generation-webui-oneclick-UI-and-API
ID: vmg0ubbuwtesbw

Maybe you need to update ExLlama or textgen webui? I have no idea how to help you.

altomek changed discussion status to closed

Sign up or log in to comment