End of training, 20 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 3e-05, 30 s 9ed61a7 verified sfedar commited on Sep 28
End of training, 20 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 4e-05, 30 s 96d6a4e verified sfedar commited on Sep 24