I hope I'm not overstepping boundaries. I tried the fix on #16 to handle batch sizes larger than 1 but stumbled upon a few issues. For example, issues processing single strings as the text input or not handling padding when the input texts are of different size.

I tried to address all the issues I had. Hope this can be of use.

@gugarosa feel free to have a look and let me know if I'm overlooking something.

Best,
William

Yea my solution wasn't complete--I wish you could edit PRs in Huggingface. Setting labels[labels == pad_token] = -100 is what finally got the training to start running if you wanted to do supervised fine tuning. On that note, I'm not entirely clear on whether its preferable to use trl.SFTTrainer for that case.

Microsoft org

hi @WilliamSotoM
this solution will throw below error when Flash attention is enabled:

modeling_phi3_v.py", line 1135, in forward
raise ValueError(
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Phi3. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.

Hello @haipingwu

That's weird, as you can see in line 198 I'm placing the padding tokens on the left of the input_ids.

Can you check the input _ids before passing them to the model and confirm the padding is falling on the wrong side?

Do you think you can provide a snippet of the code you are using to generate? I'm not having this issue using model.generate().

@haipingwu quick update... I found the bug. It happens when you don't pass any images. Line 149 defaults to the standard tokenizer and processes all the text in one go. Seems like the default padding size o the tokenizer is not set properly. I suppose it was not caught in the original code because, since no batches were being used, no padding was needed.

The solution is to add self.tokenizer.padding_side = 'left' on line 55right after initializing the tokenizer.

I'm going to close/delete this PR and create a new one with the fixed code.

@sebbyjp ,you were right, I wish you could edit PRs in Huggingface too!

Update:
This version is outdated, for the lates version of the batch size fix please check:

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/discussions/32

WilliamSotoM changed pull request status to closed

Hello, I'm still getting this error when the model runs eval on the validation set -
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Phi3. Make sure to call `tokenizer.padding_side = 'left'` before tokenizing the input.

I did set processor.tokenizer.padding_side = "left"

transformer version - 4.40.2

training arguments -

training_args = TrainingArguments(
    num_train_epochs=1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={'use_reentrant': False},
    warmup_steps=50,
    warmup_ratio=0.03,
    learning_rate=2e-5,
    weight_decay=0.,
    logging_steps=1,
    lr_scheduler_type="linear",
    output_dir="hf_trainer_test/",
    save_strategy="steps",
    save_steps=25,
    save_total_limit=2,
    evaluation_strategy="steps",
    eval_steps=1,
    # fp16=True,
    bf16=True,
    tf32 = True,
    remove_unused_columns=False,
    report_to="tensorboard",
    dataloader_num_workers = 2,
    max_steps = 50,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
)```

Hello @singhami

Please check this version: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/discussions/32

It fixed that issue

Sign up or log in to comment