Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

Provided tuning script maybe error

#59
by efei - opened

for trl script, compute loss use all tokens exclude <pad>
for colab script, compute loss use all tokens exclude <pad> <image>
there are also <fake_image_token> and user turn should not be computed.

HuggingFaceM4 org

that's indeed correct! good catch @efei
@edbeeching can we change your trl gist?
Niels fixed a discrepancy earlier this week: https://github.com/huggingface/transformers/pull/30898#issuecomment-2124884284

efei changed discussion status to closed

Sign up or log in to comment