End of training, 20 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 1e-05, 30 s df3262d verified sfedar commited on Sep 17
End of training, 12 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 1e-05, 30 s afbcb6d verified sfedar commited on Sep 16
End of training, 12 epochs, 100 batch size, 1000 writer batch size, 1 gradient accumulation steps, learning rate: 0.0001, 30 s e32dbe2 verified sfedar commited on Sep 15