Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

How to limit the loss computation to the answer ?

#44
by schwarzwalder - opened

In the idefics2 paper, there is a mention of computing the loss only for the answer part of the VQA task. I could not find such in the fine-tune colab.
Could you please provide a short snippet for that ?

Thanks in advance.

HuggingFaceM4 org

Yesn it's true that it's not present in the google colab.

In our codebase, it is done in a hacky way in the packing, by tokenizing the input, getting the positions between Assistant: and the next <end_of_utterance>, and not computing the loss on those ids.

schwarzwalder changed discussion status to closed

Sign up or log in to comment