Add TF weights

#1
by amyeroberts HF staff - opened

Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.

Maximum crossload output difference=3.481e-05; Maximum crossload hidden layer difference=8.392e-05;
Maximum conversion output difference=3.481e-05; Maximum conversion hidden layer difference=8.392e-05;

CAUTION: The maximum admissible error was manually increased to 0.0001!

Full differences:

List of maximum output differences above the threshold (1e-10):
past_key_values[0][0]: 2.384e-06
past_key_values[0][1]: 3.576e-07
past_key_values[0][2]: 2.259e-05
past_key_values[0][3]: 2.444e-05
past_key_values[1][0]: 1.073e-06
past_key_values[1][1]: 1.192e-06
past_key_values[1][2]: 3.454e-05
past_key_values[1][3]: 2.115e-05
past_key_values[2][0]: 3.576e-06
past_key_values[2][1]: 6.109e-07
past_key_values[2][2]: 2.605e-05
past_key_values[2][3]: 3.481e-05
past_key_values[3][0]: 2.384e-06
past_key_values[3][1]: 1.225e-06
past_key_values[3][2]: 3.231e-05
past_key_values[3][3]: 2.646e-05

List of maximum hidden layer differences above the threshold (1e-10):
last_hidden_state: 8.392e-05
decoder_hidden_states[1]: 2.480e-05
decoder_hidden_states[2]: 2.480e-05
decoder_hidden_states[3]: 2.480e-05
decoder_hidden_states[4]: 8.392e-05
encoder_last_hidden_state: 4.625e-05
encoder_hidden_states[0]: 1.299e-05
encoder_hidden_states[1]: 2.527e-05
encoder_hidden_states[2]: 2.861e-05
encoder_hidden_states[3]: 3.147e-05
encoder_hidden_states[4]: 4.625e-05

amyeroberts changed pull request status to merged

Sign up or log in to comment