OpenAssistant/llama2-13b-orca-8k-3319 · example code is not working

Jul 26, 2023

File /usr/local/lib/python3.9/dist-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File /usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py:300, in LlamaAttention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache)
297 key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
298 value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)
--> 300 query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)]
301 query_states = torch.cat(query_states, dim=-1)
303 key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)]

File /usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py:300, in (.0)
297 key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
298 value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)
--> 300 query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)]
301 query_states = torch.cat(query_states, dim=-1)
303 key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)]

RuntimeError: mat1 and mat2 shapes cannot be multiplied (69x5120 and 1x2560)

nigh8w0lf

Jul 26, 2023

Got the same shape mismatch error , using latest transformers>=4.31.0

Q4234

Jul 26, 2023

I guess setting the ctxlen is missing in the code ... no idea how to do this but the code for SuperCOT/SuperHOT did do this...

ttronrud

Jul 26, 2023

They need to set pretraining_tp to 1 for it to work with quantized models. You can set the value in the local copy of the model's config and that'll fix the problem.
https://github.com/facebookresearch/llama/issues/423#issuecomment-1643387661

andreaskoepf

OpenAssistant org Jul 27, 2023

@ttronrud thanks for reporting. We added pretraining_tp to the config.