Training dataset length: 108000 Validation dataset length: 12000 Test dataset length: 7600 Current performance: Eval: {'eval_loss': 1.3898906707763672, 'eval_accuracy': 0.21558333333333332, 'eval_runtime': 17.4618, 'eval_samples_per_second': 687.213, 'eval_steps_per_second': 85.902} Test: {'eval_loss': 1.3894953727722168, 'eval_accuracy': 0.21947368421052632, 'eval_runtime': 10.7033, 'eval_samples_per_second': 710.062, 'eval_steps_per_second': 88.758} Best trial: BestRun(run_id='0', objective=0.9483333333333334, hyperparameters={'learning_rate': 3e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'warmup_steps': 1000}) Training complete performance: Eval: {'eval_loss': 0.16314448416233063, 'eval_accuracy': 0.9483333333333334, 'eval_runtime': 17.4366, 'eval_samples_per_second': 688.209, 'eval_steps_per_second': 86.026, 'epoch': 2.0} Test: {'eval_loss': 0.16520710289478302, 'eval_accuracy': 0.9473684210526315, 'eval_runtime': 10.786, 'eval_samples_per_second': 704.618, 'eval_steps_per_second': 88.077, 'epoch': 2.0}