Add TF weights

#1
by amyeroberts HF staff - opened

Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.

Maximum crossload output difference=5.157e-04; Maximum crossload hidden layer difference=2.319e-03;
Maximum conversion output difference=5.157e-04; Maximum conversion hidden layer difference=2.319e-03;

CAUTION: The maximum admissible error was manually increased to 0.005!

List of maximum output differences above the threshold (1e-10):
past_key_values[0][0]: 2.146e-06
past_key_values[0][1]: 4.992e-07
past_key_values[0][2]: 4.765e-04
past_key_values[0][3]: 1.739e-04
past_key_values[1][0]: 2.950e-06
past_key_values[1][1]: 1.386e-06
past_key_values[1][2]: 3.507e-04
past_key_values[1][3]: 1.288e-04
past_key_values[2][0]: 1.049e-05
past_key_values[2][1]: 2.868e-06
past_key_values[2][2]: 2.143e-04
past_key_values[2][3]: 1.525e-04
past_key_values[3][0]: 1.382e-05
past_key_values[3][1]: 3.506e-06
past_key_values[3][2]: 3.443e-04
past_key_values[3][3]: 2.093e-04
past_key_values[4][0]: 1.651e-05
past_key_values[4][1]: 4.381e-06
past_key_values[4][2]: 4.064e-04
past_key_values[4][3]: 1.816e-04
past_key_values[5][0]: 7.033e-06
past_key_values[5][1]: 7.927e-06
past_key_values[5][2]: 2.861e-04
past_key_values[5][3]: 2.948e-04

List of maximum hidden layer differences above the threshold (1e-10):
last_hidden_state: 9.918e-05
decoder_hidden_states[1]: 5.960e-06
decoder_hidden_states[2]: 6.676e-06
decoder_hidden_states[3]: 1.764e-05
decoder_hidden_states[4]: 2.241e-05
decoder_hidden_states[5]: 2.861e-05
decoder_hidden_states[6]: 9.918e-05
encoder_last_hidden_state: 1.139e-03
encoder_hidden_states[0]: 1.383e-05
encoder_hidden_states[1]: 1.878e-05
encoder_hidden_states[2]: 1.633e-05
encoder_hidden_states[3]: 1.383e-04
encoder_hidden_states[4]: 7.187e-03
encoder_hidden_states[5]: 6.477e-03
encoder_hidden_states[6]: 1.139e-03

amyeroberts changed pull request status to merged

Sign up or log in to comment