Transformers
PyTorch
flava
pretraining
Inference Endpoints

Incorrect comments in example

#4
by mjspeck - opened

Under FlavaForPreTraining the value for outputs.multimodal_embeddings is actually None, as opposed to what the adjacent comment implies : # Batch size X (Number of image patches + Text Sequence Length + 3) X Hidden size => 2 X 275 x 768. Why? Doesn't seem like the README author expected this.

I assume it has to do with this: inputs.bool_masked_pos.zero_()

Sign up or log in to comment