|
Training dataset length: |
|
18978 |
|
Validation dataset length: |
|
4625 |
|
Test dataset length: |
|
4566 |
|
Current eval performance: |
|
Evaluate results: |
|
{'eval_exact': 37.68648648648649, 'eval_f1': 44.0549722409684, 'eval_total': 4625, 'eval_HasAns_exact': 37.68648648648649, 'eval_HasAns_f1': 44.0549722409684, 'eval_HasAns_total': 4625, 'eval_best_exact': 37.68648648648649, 'eval_best_exact_thresh': 0.0, 'eval_best_f1': 44.0549722409684, 'eval_best_f1_thresh': 0.0, 'eval_samples': 4625} |
|
Predict results: |
|
{'test_exact': 35.129215943933424, 'test_f1': 42.10453725499187, 'test_total': 4566, 'test_HasAns_exact': 35.129215943933424, 'test_HasAns_f1': 42.10453725499187, 'test_HasAns_total': 4566, 'test_best_exact': 35.129215943933424, 'test_best_exact_thresh': 0.0, 'test_best_f1': 42.10453725499187, 'test_best_f1_thresh': 0.0, 'predict_samples': 4566} |
|
Best trial: |
|
BestRun(run_id='1', objective=87.56834325824502, hyperparameters={'learning_rate': 5e-05, 'num_train_epochs': 5, 'per_device_train_batch_size': 16, 'warmup_steps': 0}) |
|
|