Tanrei/GPTSAN-japanese · GPTSAN Text Summarization

Hi guys,
I am trying to fine-tune GPTSAN model for summarisation task as explained in a blog post of huggingface. Please look here. The modifications I made are

changing model
re-define preprocessing step.

def preprocess_function(examples):
                    model_inputs  = tokenizer(examples['text'], truncation=True, max_length=1024)
                    labels = tokenizer(examples['summary'], truncation=True, max_length=128)
                    model_inputs['labels'] = labels['input_ids']
                    return model_inputs}

I used standard training loop instead of trainer.

for batch in train_dataloader:
      optimzier.zero_grad()
      batch = [k: v.to(device) for k, v in batch.items()]
      pred = model(input_ids = batch['input_ids'],
                                   attention_mask = batch['attention_mask'],
                                   token_type_ids = batch['token_type_ids'],
                                   decoder_inputs_embeds = batch['decoder_input_ids'],
                                   labels = batch['label'])
       loss = pred.loss
       loss.backward()
       optimizer.step()
       scheduler.step()

Then I got an error message saying that the shape of input_ids is not the same as that of the labels.
ValueError: Expected input batch_size (1024) to match target batch_size(128).

Setting those shapes equal in the preprocess function solve the error. But I don't know it is theoretically true or wrong.
So, I want to know that it is true?

Thank you for your kind suggestion.