|
Training dataset length: |
|
56164 |
|
Validation dataset length: |
|
13000 |
|
Test dataset length: |
|
13812 |
|
Current eval performance: |
|
Evaluate results: |
|
{'eval_exact': 63.38461538461539, 'eval_f1': 65.59405430995848, 'eval_total': 13000, 'eval_HasAns_exact': 42.352654651729175, 'eval_HasAns_f1': 49.34795568179736, 'eval_HasAns_total': 4106, 'eval_NoAns_exact': 73.09422082302676, 'eval_NoAns_f1': 73.09422082302676, 'eval_NoAns_total': 8894, 'eval_best_exact': 68.5, 'eval_best_exact_thresh': 0.0, 'eval_best_f1': 68.52960805860803, 'eval_best_f1_thresh': 0.0, 'eval_samples': 13000} |
|
Predict results: |
|
{'test_exact': 63.90095569070374, 'test_f1': 66.14471389068335, 'test_total': 13812, 'test_HasAns_exact': 39.21136419299535, 'test_HasAns_f1': 46.80156459909871, 'test_HasAns_total': 4083, 'test_NoAns_exact': 74.26251413300442, 'test_NoAns_f1': 74.26251413300442, 'test_NoAns_total': 9729, 'test_best_exact': 70.55459021141037, 'test_best_exact_thresh': 0.0, 'test_best_f1': 70.6760355196497, 'test_best_f1_thresh': 0.0, 'predict_samples': 13812} |
|
Best trial: |
|
BestRun(run_id='8', objective=97.27663003662997, hyperparameters={'learning_rate': 5e-05, 'num_train_epochs': 5, 'per_device_train_batch_size': 16, 'warmup_steps': 1000}) |
|
|