|
Training dataset length: |
|
75365 |
|
Validation dataset length: |
|
18000 |
|
Test dataset length: |
|
17966 |
|
Current eval performance: |
|
Evaluate results: |
|
{'eval_exact': 61.166666666666664, 'eval_f1': 62.85259551738643, 'eval_total': 18000, 'eval_HasAns_exact': 29.12621359223301, 'eval_HasAns_f1': 34.38743400016661, 'eval_HasAns_total': 5768, 'eval_NoAns_exact': 76.2753433616743, 'eval_NoAns_f1': 76.2753433616743, 'eval_NoAns_total': 12232, 'eval_best_exact': 67.96666666666667, 'eval_best_exact_thresh': 0.0, 'eval_best_f1': 67.98520126170126, 'eval_best_f1_thresh': 0.0, 'eval_samples': 18000} |
|
Predict results: |
|
{'test_exact': 62.00601135478125, 'test_f1': 63.80738510177029, 'test_total': 17966, 'test_HasAns_exact': 27.75644541963796, 'test_HasAns_f1': 33.67406852046233, 'test_HasAns_total': 5469, 'test_NoAns_exact': 76.99447867488198, 'test_NoAns_f1': 76.99447867488198, 'test_NoAns_total': 12497, 'test_best_exact': 69.59812980073472, 'test_best_exact_thresh': 0.0, 'test_best_f1': 69.61402402068104, 'test_best_f1_thresh': 0.0, 'predict_samples': 17966} |
|
Best trial: |
|
BestRun(run_id='6', objective=96.43176121181548, hyperparameters={'learning_rate': 5e-05, 'num_train_epochs': 4, 'per_device_train_batch_size': 16, 'warmup_steps': 0}) |
|
|