End of training, 20 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 5e-05, 30 s 9a2dfca verified sfedar commited on Sep 29
End of training, 20 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 7e-05, 30 s 54c3a37 verified sfedar commited on Sep 28