Masking the image tokens during training

#68

by jchiu1234 - opened Mar 5, 2024

Mar 5, 2024

Should you mask the image tokens in the decoder output during training? I'm trying to wrap my head around this. An argument for why you wouldn't is that maybe you want the model to know the previous inputs are images? Could someone suggest whether the model should be trained with the image tokens masked or not?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment