Add TF weights

#2
by Rocketknight1 HF staff - opened

Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.

Maximum crossload output difference=1.993e+00; Maximum crossload hidden layer difference=1.552e-04;
Maximum conversion output difference=1.991e+00; Maximum conversion hidden layer difference=1.552e-04;

CAUTION: The maximum admissible error was manually increased to 2.0!

Quick note on this PR: The huge output difference is caused by the original checkpoint not having any pooler weights, which get randomly initialized separately in both PT and TF as a result. The actual difference between model outputs other than the pooler is ~1e-4, which is well within acceptable limits.

Rocketknight1 changed pull request status to merged

Sign up or log in to comment