Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.

Maximum crossload output difference=1.907e-05; Maximum crossload hidden layer difference=4.541e-02;
Maximum conversion output difference=1.907e-05; Maximum conversion hidden layer difference=4.541e-02;

CAUTION: The maximum admissible error was manually increased to 0.1!

@Rocketknight1 This is the smaller l1; as for l3, there is some larger variation in hidden layers values than when running tests or doing inference on an image. The predictions are matching though.

@D-Roberts This is something we often observe with PT->TF conversions. I did a deep dive once or twice and the cause seems to consistently just be accumulating small numerical variations because the frameworks use different kernels and TF can reorder or fuse operations during compilation. In general, the actual outputs/predictions seem reasonably robust to this, so it's not a major concern (although it does sometimes mask -actual- bugs/implementation differences).

This comment has been hidden
alanspike changed pull request status to merged

Sign up or log in to comment