Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.

Maximum crossload output difference=2.575e-05; Maximum crossload hidden layer difference=3.540e-03;
Maximum conversion output difference=2.575e-05; Maximum conversion hidden layer difference=3.540e-03;

CAUTION: The maximum admissible error was manually increased to 0.1!

@Rocketknight1 For some reason there is some more random variation in the hidden layers diffs then when running tests, though the predictions are matching. This is l3, will add l1 as well.

alanspike changed pull request status to merged

Sign up or log in to comment