Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference

example code is not working

#5
by Q4234 - opened

File /usr/local/lib/python3.9/dist-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File /usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py:300, in LlamaAttention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache)
297 key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
298 value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)
--> 300 query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)]
301 query_states = torch.cat(query_states, dim=-1)
303 key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)]

File /usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py:300, in (.0)
297 key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
298 value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)
--> 300 query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)]
301 query_states = torch.cat(query_states, dim=-1)
303 key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)]

RuntimeError: mat1 and mat2 shapes cannot be multiplied (69x5120 and 1x2560)

Got the same shape mismatch error , using latest transformers>=4.31.0

I guess setting the ctxlen is missing in the code ... no idea how to do this but the code for SuperCOT/SuperHOT did do this...

They need to set pretraining_tp to 1 for it to work with quantized models. You can set the value in the local copy of the model's config and that'll fix the problem.
https://github.com/facebookresearch/llama/issues/423#issuecomment-1643387661

OpenAssistant org

@ttronrud thanks for reporting. We added pretraining_tp to the config.

Sign up or log in to comment