Add TF weights

#1
by amyeroberts HF staff - opened

Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.

Maximum crossload output difference=5.989e-04; Maximum crossload hidden layer difference=1.715e-02;
Maximum conversion output difference=5.989e-04; Maximum conversion hidden layer difference=1.715e-02;

CAUTION: The maximum admissible error was manually increased to 0.09!

List of maximum output differences above the threshold (1e-10):
past_key_values[0][0]: 9.537e-07
past_key_values[0][1]: 1.252e-06
past_key_values[0][2]: 2.210e-04
past_key_values[0][3]: 1.278e-04
past_key_values[1][0]: 2.716e-06
past_key_values[1][1]: 1.848e-06
past_key_values[1][2]: 2.418e-04
past_key_values[1][3]: 1.188e-04
past_key_values[2][0]: 3.904e-06
past_key_values[2][1]: 1.691e-06
past_key_values[2][2]: 1.540e-04
past_key_values[2][3]: 1.163e-04
past_key_values[3][0]: 8.821e-06
past_key_values[3][1]: 2.027e-06
past_key_values[3][2]: 2.997e-04
past_key_values[3][3]: 1.274e-04
past_key_values[4][0]: 7.033e-06
past_key_values[4][1]: 1.878e-06
past_key_values[4][2]: 2.249e-04
past_key_values[4][3]: 1.329e-04
past_key_values[5][0]: 2.384e-06
past_key_values[5][1]: 1.200e-06
past_key_values[5][2]: 2.793e-04
past_key_values[5][3]: 1.658e-04
past_key_values[6][0]: 1.907e-06
past_key_values[6][1]: 9.835e-07
past_key_values[6][2]: 2.894e-04
past_key_values[6][3]: 1.137e-04
past_key_values[7][0]: 1.788e-06
past_key_values[7][1]: 7.898e-07
past_key_values[7][2]: 3.052e-04
past_key_values[7][3]: 1.511e-04
past_key_values[8][0]: 2.742e-06
past_key_values[8][1]: 9.979e-07
past_key_values[8][2]: 2.956e-04
past_key_values[8][3]: 1.950e-04
past_key_values[9][0]: 2.503e-06
past_key_values[9][1]: 1.267e-06
past_key_values[9][2]: 2.534e-04
past_key_values[9][3]: 1.412e-04
past_key_values[10][0]: 2.384e-06
past_key_values[10][1]: 1.401e-06
past_key_values[10][2]: 2.341e-04
past_key_values[10][3]: 2.259e-04
past_key_values[11][0]: 2.980e-06
past_key_values[11][1]: 1.558e-06
past_key_values[11][2]: 3.011e-04
past_key_values[11][3]: 2.747e-04

List of maximum hidden layer differences above the threshold (1e-10):
last_hidden_state: 9.015e-04
decoder_hidden_states[1]: 1.669e-05
decoder_hidden_states[2]: 1.812e-05
decoder_hidden_states[3]: 1.800e-05
decoder_hidden_states[4]: 1.788e-05
decoder_hidden_states[5]: 2.518e-04
decoder_hidden_states[6]: 2.518e-04
decoder_hidden_states[7]: 2.518e-04
decoder_hidden_states[8]: 2.518e-04
decoder_hidden_states[9]: 2.518e-04
decoder_hidden_states[10]: 2.518e-04
decoder_hidden_states[11]: 2.670e-04
decoder_hidden_states[12]: 9.015e-04
encoder_last_hidden_state: 9.117e-04
encoder_hidden_states[0]: 1.860e-05
encoder_hidden_states[1]: 2.408e-05
encoder_hidden_states[2]: 2.241e-05
encoder_hidden_states[3]: 2.193e-05
encoder_hidden_states[4]: 2.098e-05
encoder_hidden_states[5]: 3.052e-05
encoder_hidden_states[6]: 3.248e-05
encoder_hidden_states[7]: 4.137e-05
encoder_hidden_states[8]: 1.385e-02
encoder_hidden_states[9]: 1.385e-02
encoder_hidden_states[10]: 1.385e-02
encoder_hidden_states[11]: 1.389e-02
encoder_hidden_states[12]: 9.117e-04

amyeroberts changed pull request status to merged

Sign up or log in to comment