google/flan-t5-xl · Inference of prompt tuned model fails with tensor size mismatch

Hello,

I've prompt-tuned the flan-t5-xl model and used it to run inference on a tweet dataset:

model_name_or_path     = "google/flan-t5-xl"  

peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM, 
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=16,
    prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
    tokenizer_name_or_path=model_name_or_path,
)

model = T5ForConditionalGeneration.from_pretrained(model_name_or_path, torch_dtype=torch.bfloat16) 
model = get_peft_model(model, peft_config)

# fine-tune ...

inputs = tokenizer(
    f'{text_column} : {"@nationalgridus I have no water and the bill is current and paid. Can you do something about this?"} Label : ',
    return_tensors="pt",
)
print(inputs)

model.to(device)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    print(inputs)
    outputs = model.generate(
        input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"]
    )
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

The code fails at inference time in model.generate with the following error:

RuntimeError: The size of tensor a (32) must match the size of tensor b (48) at non-singleton dimension 3

The difference between the b tensor size and a tensor size is exactly the number of virtual tokens in the soft prompt.
To me, this error indicates that the soft prompt is not appended to the input tensor?

Does anybody know what could the issue be here? Any feedback is highly appreciated, thanks!