Taiwan-ELM-270M / train_results.json
liswei's picture
Update model with 2x training data and more efficient vocabulary
ee1b4c2 verified
{
"epoch": 2.999944821497545,
"total_flos": 3.6968537365004943e+18,
"train_loss": 3.121767396105489,
"train_runtime": 856366.0469,
"train_samples_per_second": 3.047,
"train_steps_per_second": 0.048
}