meta-llama/Meta-Llama-3-8B-Instruct · RuntimeError: The size of tensor a (3840) must match the size of tensor b (2) at non-singleton dimension 1

I am getting this error about the mismatch of tensor dimensions on the generate function on model meta-llama/Meta-Llama-3-8B-Instruct. I have tried running with batch_size = 1 and 2, padding = True or False but the error persists. The same code runs fine with Mistral so it is unlikely that the code has issues.

I am running for a

context window of 4096 but have tried reducing it as well.
transformers 4.42.2.
using a A100 with nvidia Pytorch container (docker://nvcr.io/nvidia/pytorch:24.07-py3). I have torch 2.4.0a0+3bcc3cddb5.nv24.7 which came with the container .The same with Torch 2.3.1+cu121 works fine on Google Colab.

Here is the traceback:

Traceback (most recent call last):
File "/workspace/domain-adaptation/llm_summ_eval.py", line 183, in
inputs, outputs = summarizer.get_summary(dataset=dtest)
File "/workspace/domain-adaptation/experiments/summarizer.py", line 222, in get_summary
return self.get_outputs(dataset=dataset)
File "/workspace/domain-adaptation/experiments/summarizer.py", line 179, in get_outputs
generated_ids = self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1690, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 495, in _prepare_attention_mask_for_generation
attention_mask_from_padding = inputs.ne(pad_token_id).long()
RuntimeError: The size of tensor a (3840) must match the size of tensor b (2) at non-singleton dimension 1
Traceback (most recent call last):
File "/workspace/domain-adaptation/llm_summ_eval.py", line 183, in
inputs, outputs = summarizer.get_summary(dataset=dtest)
File "/workspace/domain-adaptation/experiments/summarizer.py", line 222, in get_summary
return self.get_outputs(dataset=dataset)
File "/workspace/domain-adaptation/experiments/summarizer.py", line 179, in get_outputs
generated_ids = self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1690, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 495, in _prepare_attention_mask_for_generation
attention_mask_from_padding = inputs.ne(pad_token_id).long()
RuntimeError: The size of tensor a (3840) must match the size of tensor b (2) at non-singleton dimension 1