File size: 508 Bytes
57bdca5 |
1 2 3 4 5 6 |
To avoid overflows under fp16 the activations must remain way below 1e4, because 1e4 * 1e4 = 1e8 so any matrix multiplication with large activations is going to lead to a numerical overflow condition. At the very start of the trace you can discover at which batch number the problem occurred (here Detected inf/nan during batch_number=0 means the problem occurred on the first batch). Each reported frame starts by declaring the fully qualified entry for the corresponding module this frame is reporting for. |