Add TF weights

#1
by amyeroberts HF staff - opened

Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.

Maximum crossload output difference=4.644e-04; Maximum crossload hidden layer difference=5.890e-03;
Maximum conversion output difference=4.644e-04; Maximum conversion hidden layer difference=5.890e-03;

CAUTION: The maximum admissible error was manually increased to 0.009!

List of maximum output differences above the threshold (1e-10):
past_key_values[0][0]: 1.669e-06
past_key_values[0][1]: 8.941e-07
past_key_values[0][2]: 1.013e-03
past_key_values[0][3]: 6.801e-04
past_key_values[1][0]: 1.073e-06
past_key_values[1][1]: 7.153e-07
past_key_values[1][2]: 9.886e-04
past_key_values[1][3]: 2.424e-04
past_key_values[2][0]: 2.146e-06
past_key_values[2][1]: 1.192e-06
past_key_values[2][2]: 6.944e-04
past_key_values[2][3]: 7.342e-04
past_key_values[3][0]: 7.629e-06
past_key_values[3][1]: 6.985e-07
past_key_values[3][2]: 7.926e-04
past_key_values[3][3]: 3.730e-04
past_key_values[4][0]: 4.653e-06
past_key_values[4][1]: 1.132e-06
past_key_values[4][2]: 9.130e-04
past_key_values[4][3]: 7.603e-04
past_key_values[5][0]: 4.388e-06
past_key_values[5][1]: 1.132e-06
past_key_values[5][2]: 9.823e-04
past_key_values[5][3]: 4.344e-04
past_key_values[6][0]: 4.351e-06
past_key_values[6][1]: 2.205e-06
past_key_values[6][2]: 1.007e-03
past_key_values[6][3]: 5.733e-04
past_key_values[7][0]: 3.338e-06
past_key_values[7][1]: 1.833e-06
past_key_values[7][2]: 9.663e-04
past_key_values[7][3]: 5.342e-04
past_key_values[8][0]: 3.576e-06
past_key_values[8][1]: 1.907e-06
past_key_values[8][2]: 1.152e-03
past_key_values[8][3]: 5.532e-04
past_key_values[9][0]: 3.338e-06
past_key_values[9][1]: 1.848e-06
past_key_values[9][2]: 1.095e-03
past_key_values[9][3]: 4.111e-04
past_key_values[10][0]: 3.159e-06
past_key_values[10][1]: 2.623e-06
past_key_values[10][2]: 1.055e-03
past_key_values[10][3]: 7.403e-04
past_key_values[11][0]: 3.040e-06
past_key_values[11][1]: 3.338e-06
past_key_values[11][2]: 1.234e-03
past_key_values[11][3]: 1.402e-03

List of maximum hidden layer differences above the threshold (1e-10):
last_hidden_state: 9.584e-05
decoder_hidden_states[1]: 5.245e-06
decoder_hidden_states[2]: 3.815e-06
decoder_hidden_states[3]: 2.480e-05
decoder_hidden_states[4]: 2.480e-05
decoder_hidden_states[5]: 2.480e-05
decoder_hidden_states[6]: 2.670e-05
decoder_hidden_states[7]: 2.289e-05
decoder_hidden_states[8]: 2.289e-05
decoder_hidden_states[9]: 2.289e-05
decoder_hidden_states[10]: 2.670e-05
decoder_hidden_states[11]: 3.815e-05
decoder_hidden_states[12]: 9.584e-05
encoder_last_hidden_state: 3.974e-03
encoder_hidden_states[0]: 2.003e-05
encoder_hidden_states[1]: 2.003e-05
encoder_hidden_states[2]: 1.979e-05
encoder_hidden_states[3]: 2.861e-05
encoder_hidden_states[4]: 3.910e-05
encoder_hidden_states[5]: 4.220e-05
encoder_hidden_states[6]: 1.311e-04
encoder_hidden_states[7]: 1.477e-02
encoder_hidden_states[8]: 1.477e-02
encoder_hidden_states[9]: 1.474e-02
encoder_hidden_states[10]: 1.471e-02
encoder_hidden_states[11]: 1.471e-02
encoder_hidden_states[12]: 3.974e-03

amyeroberts changed pull request status to merged

Sign up or log in to comment