Error when running the model

#1
by Leaf45 - opened

When I try to run the model I see this:

Traceback (most recent call last):
File "/home/galaxia/text-generation-webui/modules/text_generation.py", line 323, in generate_reply_custom
for reply in shared.model.generate_with_streaming(question, state):
File "/home/galaxia/text-generation-webui/modules/exllama.py", line 81, in generate_with_streaming
self.generator.gen_begin_reuse(ids)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/generator.py", line 193, in gen_begin_reuse
self.gen_begin(in_tokens, max_chunk)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/generator.py", line 177, in gen_begin
self.model.forward(self.sequence[:, a:b], self.cache, preprocess_only = True, lora = self.lora)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 860, in forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 466, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 384, in forward
key_states = key_states.view(bsz, q_len, self.config.num_attention_heads, self.config.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 188, 64, 128]' is invalid for input of size 192512
Output generated in 0.00 seconds (0.00 tokens/s, 0 tokens, context 189, seed 1872453553)

I can run other llama2 models but this one messes up. Maybe it has something to do with being a code llama model

I was able to load this in oobabooga with exllama ok, couldn't load it with anything else. its seems to work ok

When I try to generate with Exllama I get an empty reply and it's basically saying the same thing. An empty reply is more than before but still nothing

Traceback (most recent call last):
File "/home/galaxia/text-generation-webui/modules/text_generation.py", line 323, in generate_reply_custom
for reply in shared.model.generate_with_streaming(question, state):
File "/home/galaxia/text-generation-webui/modules/exllama.py", line 81, in generate_with_streaming
self.generator.gen_begin_reuse(ids)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/generator.py", line 193, in gen_begin_reuse
self.gen_begin(in_tokens, max_chunk)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/generator.py", line 177, in gen_begin
self.model.forward(self.sequence[:, a:b], self.cache, preprocess_only = True, lora = self.lora)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 860, in forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 466, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora)
File "/home/galaxia/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 384, in forward
key_states = key_states.view(bsz, q_len, self.config.num_attention_heads, self.config.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 183, 64, 128]' is invalid for input of size 187392
Output generated in 0.02 seconds (0.00 tokens/s, 0 tokens, context 184, seed 1334487251)

I am getting a reasonably quick reply on a 3090 with well formed answers, though maybe a little on the short side. This was a fresh oobabooga 2 days ago, I am guessing there is something in your environment/installation causing the issue?

I had trouble loading this with oobabooga until I updated it. Make sure to select Exllamav2. I also updated the prompt template to what's listed in the description.

Is Exllamav2 meant to become available through upgrading? Because I don't have that option even after upgrading. Though it did happen before that I had to reinstall it to get the new features before

Sign up or log in to comment