Ahmadzei's picture
update 1
57bdca5
raw
history blame
451 Bytes
You can see that it was called from an attribute dropout inside DenseReluDense class. We can see
that it happened during the first layer, of the 2nd block, during the very first batch. Finally, the absolute largest
input elements was 6.27e+04 and same for the output was inf.
You can see here, that T5DenseGatedGeluDense.forward resulted in output activations, whose absolute max value was
around 62.7K, which is very close to fp16's top limit of 64K.