sfedar's picture
End of training, 20 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 6e-05, 30 s
4f9aa25 verified