ModernBert in Encoder-Decoder -> "got an unexpected keyword argument 'inputs_embeds'"

#41
by KaranShishoo - opened

Hello,
I am trying to train an encoder-decoder model that uses ModernBert as the encoder and GPT2 as the decoder. I had hoped that this would be straightforward enough using HF provided classes/trainers for Seq2Seq but have run into an error I have not been able to fix. Currently I do the following -

tokenizer_MBert = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base", device_map = 'cuda:0')
model = EncoderDecoderModel.from_encoder_decoder_pretrained("answerdotai/ModernBERT-base", "gpt2",
                                                             pad_token_id=tokenizer_MBert.eos_token_id, 
                                                             device_map = 'cuda:0')
model.decoder.config.use_cache = False
model.gradient_checkpointing_enable()
           
tokenizer_MBert.bos_token = tokenizer_MBert.cls_token
tokenizer_MBert.eos_token = tokenizer_MBert.sep_token
tokenizer_MBert.pad_token = tokenizer_MBert.unk_token

def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
    outputs = [self.bos_token_id] + token_ids_0 + [self.eos_token_id]
    return outputs

GPT2Tokenizer.build_inputs_with_special_tokens = build_inputs_with_special_tokens
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2", device_map = 'cuda:0')
gpt2_tokenizer.pad_token = gpt2_tokenizer.unk_token

model.config.decoder_start_token_id = gpt2_tokenizer.bos_token_id
model.config.pad_token_id = tokenizer_MBert.pad_token_id
model.config.eos_token_id = gpt2_tokenizer.eos_token_id
model.config.no_repeat_ngram_size = 3
model.early_stopping = True
model.length_penalty = 3.0
model.num_beams = 2

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer_MBert, model=model)
optimizer = 'adamw_torch'
lr_scheduler = 'linear'

training_args = Seq2SeqTrainingArguments(
    output_dir="./MBert_GPT2",
    eval_strategy="steps",
    eval_steps=2000,
    save_strategy="steps",
    save_steps=2000,
    logging_steps=100,
    max_steps=10000,
    do_eval=True,
    optim=optimizer,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={'use_reentrant':False},
    learning_rate=2e-5,
    log_level="debug",
    per_device_train_batch_size=20,
    per_device_eval_batch_size=20,
    lr_scheduler_type=lr_scheduler,
    bf16=True,
    report_to="wandb",
    run_name="MBert_GPT2",
    seed=42,
    predict_with_generate=True,
    generation_max_length=300
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    tokenizer=tokenizer_MBert,
    data_collator=data_collator,
)
trainer.train()

The tokenized_dataset contains the input_ids and labels.
This is also using the latest version of transformers right from their git page.
The training is started using notebook_launcher from accelerate and then it gives this error -

TypeError: ModernBertModel.forward() got an unexpected keyword argument 'inputs_embeds'

I have looked at the modernBert forward code and have seen that it indeed does not take in inputs_embeds as an input, but I was under the impression that since I was providing the input_ids, no input_embds should have been passed through during the training. I am not sure if ModernBert is not meant to be used in an Encoder-Decoder setup or if I have just implemented it incorrectly.

I do believe that the issue occurs when EncoderDecoderModel attempts to calculate loss since I am able to generate using the EncoderDecoderModel using input_ids but get the error when attempting to calculate loss.

Any help would be appreciated.

Did you solve it?

@khusrav13 Yes it is solved, there was a gitpull request outlining the same issue - https://github.com/huggingface/transformers/pulls?q=inputs_embeds which was solved earlier and since then it has worked

KaranShishoo changed discussion status to closed

@KaranShishoo , I'm exploring whether it's possible to build a sequence-to-sequence model in the Transformers library using ModernBERT as the encoder and Llama 3.1 (8B) as the decoder. I attempted this setup a few days ago but ran into issues, by running EncoderDecoderModel class in Hugging Face Transformers . Do you know of any working examples or tutorials that demonstrate how to configure this type of seq2seq pipeline? Thank you!

Sign up or log in to comment