flan-t5-xl-grammar-synthesis / train_results.json
Peter Szemraj
add sharded chkpt @ 2 epochs, sess 2
412d094
{
"epoch": 2.0,
"train_loss": 0.054031920192549335,
"train_runtime": 152294.5114,
"train_samples": 180080,
"train_samples_per_second": 2.365,
"train_steps_per_second": 0.037
}