altomek
/

New-Dawn-Llama-3-70B-32K-v1.0-4bpw-EXL2

Text Generation

Not-for-all-Audiences

Inference Endpoints

text-generation-inference

4-bit precision

Model card Files Files and versions Community

AssertionError: Total sequence length exceeds cache size in model.forward

#1

by Hardcore7651 - opened 7 days ago

7 days ago

I'm getting this error when running past 2k context despite having the modeled loaded for 32k on runpod on an a6000.

I belive it is related to this: https://github.com/oobabooga/text-generation-webui/issues/5750#issuecomment-2024442282

But I am not knowledgable enough to be sure.

altomek

Owner 7 days ago

•

edited 7 days ago

I use text-generation-webui from May 19 and do not have this issue. I use 4bit cache. What are your settings and what version do you use?

BTW, I made a small update in config.json and tokenizer_config.json - I believe it is unrelated to your problem, but please update those files.

7 days ago

Max length is at 32k. Alpha value is at 1, compress_pos_emb at 1. I have tried both 8 and 4 bit cache and neither worked. I can get successful generations up to about 2k then it will simply fail. Also on textgen webui.

This is my pod template: text-generation-webui-oneclick-UI-and-API
ID: vmg0ubbuwtesbw

altomek

Owner 6 days ago

Maybe you need to update ExLlama or textgen webui? I have no idea how to help you.

altomek changed discussion status to closed 6 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment