Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.

Maximum crossload output difference=4.902e-04; Maximum crossload hidden layer difference=1.125e-01;
Maximum conversion output difference=4.902e-04; Maximum conversion hidden layer difference=1.125e-01;

CAUTION: The maximum admissible error was manually increased to 0.15!

@joaogante , @nielsr , @sgugger

The max error was increased due to batch normalization creating differences that get amplified through the forward pass.

This is the corresponding github PR : https://github.com/huggingface/transformers/pull/18597

joaogante changed pull request status to merged

Sign up or log in to comment