bert-large-uncased-rte / training.log
yoshitomo-matsubara's picture
tuned hyperparameters
527a6a3
2021-05-27 03:06:57,518 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/ce/bert_large_uncased.yaml', log='log/glue/rte/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)
2021-05-27 03:06:57,550 INFO __main__ Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True
2021-05-27 03:07:10,458 INFO __main__ Start training
2021-05-27 03:07:10,459 INFO torchdistill.models.util [student model]
2021-05-27 03:07:10,459 INFO torchdistill.models.util Using the original student model
2021-05-27 03:07:10,459 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
2021-05-27 03:07:13,632 INFO torchdistill.misc.log Epoch: [0] [ 0/312] eta: 0:02:13 lr: 1.997863247863248e-05 sample/s: 9.410467095760941 loss: 0.6985 (0.6985) time: 0.4283 data: 0.0032 max mem: 5355
2021-05-27 03:07:34,141 INFO torchdistill.misc.log Epoch: [0] [ 50/312] eta: 0:01:47 lr: 1.891025641025641e-05 sample/s: 9.260763560189773 loss: 0.6855 (0.7316) time: 0.4141 data: 0.0017 max mem: 7365
2021-05-27 03:07:54,769 INFO torchdistill.misc.log Epoch: [0] [100/312] eta: 0:01:27 lr: 1.7841880341880344e-05 sample/s: 9.300060421620962 loss: 0.6784 (0.7141) time: 0.4156 data: 0.0018 max mem: 7365
2021-05-27 03:08:15,006 INFO torchdistill.misc.log Epoch: [0] [150/312] eta: 0:01:06 lr: 1.6773504273504274e-05 sample/s: 9.304485750349947 loss: 0.6591 (0.7030) time: 0.4016 data: 0.0017 max mem: 7365
2021-05-27 03:08:35,380 INFO torchdistill.misc.log Epoch: [0] [200/312] eta: 0:00:45 lr: 1.5705128205128205e-05 sample/s: 9.304217428726076 loss: 0.6689 (0.6939) time: 0.4085 data: 0.0017 max mem: 7365
2021-05-27 03:08:55,663 INFO torchdistill.misc.log Epoch: [0] [250/312] eta: 0:00:25 lr: 1.4636752136752137e-05 sample/s: 9.296092555242803 loss: 0.6506 (0.6908) time: 0.4118 data: 0.0018 max mem: 7365
2021-05-27 03:09:16,350 INFO torchdistill.misc.log Epoch: [0] [300/312] eta: 0:00:04 lr: 1.356837606837607e-05 sample/s: 9.244190153358904 loss: 0.6405 (0.6823) time: 0.4173 data: 0.0017 max mem: 7365
2021-05-27 03:09:20,776 INFO torchdistill.misc.log Epoch: [0] Total time: 0:02:07
2021-05-27 03:09:24,609 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:09:24,610 INFO __main__ Validation: accuracy = 0.6173285198555957
2021-05-27 03:09:24,610 INFO __main__ Updating ckpt at ./resource/ckpt/glue/rte/ce/rte-bert-large-uncased
2021-05-27 03:09:29,657 INFO torchdistill.misc.log Epoch: [1] [ 0/312] eta: 0:02:19 lr: 1.3311965811965812e-05 sample/s: 9.012096369155467 loss: 0.6355 (0.6355) time: 0.4460 data: 0.0021 max mem: 7365
2021-05-27 03:09:50,469 INFO torchdistill.misc.log Epoch: [1] [ 50/312] eta: 0:01:49 lr: 1.2243589743589746e-05 sample/s: 9.16842369299592 loss: 0.5310 (0.5850) time: 0.4242 data: 0.0017 max mem: 7365
2021-05-27 03:10:10,381 INFO torchdistill.misc.log Epoch: [1] [100/312] eta: 0:01:26 lr: 1.1175213675213676e-05 sample/s: 11.36395200609068 loss: 0.4777 (0.5638) time: 0.4081 data: 0.0017 max mem: 7365
2021-05-27 03:10:30,677 INFO torchdistill.misc.log Epoch: [1] [150/312] eta: 0:01:05 lr: 1.0106837606837608e-05 sample/s: 9.29443941850809 loss: 0.5052 (0.5646) time: 0.4139 data: 0.0017 max mem: 7365
2021-05-27 03:10:51,658 INFO torchdistill.misc.log Epoch: [1] [200/312] eta: 0:00:45 lr: 9.03846153846154e-06 sample/s: 9.300792527973533 loss: 0.4903 (0.5516) time: 0.4198 data: 0.0017 max mem: 7365
2021-05-27 03:11:12,170 INFO torchdistill.misc.log Epoch: [1] [250/312] eta: 0:00:25 lr: 7.970085470085472e-06 sample/s: 9.282976357127444 loss: 0.4948 (0.5433) time: 0.4087 data: 0.0017 max mem: 7365
2021-05-27 03:11:32,902 INFO torchdistill.misc.log Epoch: [1] [300/312] eta: 0:00:04 lr: 6.901709401709402e-06 sample/s: 9.302071412730095 loss: 0.4903 (0.5442) time: 0.4143 data: 0.0017 max mem: 7365
2021-05-27 03:11:36,971 INFO torchdistill.misc.log Epoch: [1] Total time: 0:02:07
2021-05-27 03:11:40,801 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:11:40,801 INFO __main__ Validation: accuracy = 0.740072202166065
2021-05-27 03:11:40,802 INFO __main__ Updating ckpt at ./resource/ckpt/glue/rte/ce/rte-bert-large-uncased
2021-05-27 03:11:46,120 INFO torchdistill.misc.log Epoch: [2] [ 0/312] eta: 0:02:19 lr: 6.645299145299145e-06 sample/s: 9.060380879964056 loss: 0.7969 (0.7969) time: 0.4475 data: 0.0060 max mem: 7365
2021-05-27 03:12:06,781 INFO torchdistill.misc.log Epoch: [2] [ 50/312] eta: 0:01:48 lr: 5.576923076923077e-06 sample/s: 10.741735558355614 loss: 0.3631 (0.3967) time: 0.4164 data: 0.0017 max mem: 7365
2021-05-27 03:12:27,317 INFO torchdistill.misc.log Epoch: [2] [100/312] eta: 0:01:27 lr: 4.508547008547009e-06 sample/s: 9.16765717108441 loss: 0.3599 (0.3882) time: 0.4054 data: 0.0017 max mem: 7365
2021-05-27 03:12:47,698 INFO torchdistill.misc.log Epoch: [2] [150/312] eta: 0:01:06 lr: 3.4401709401709403e-06 sample/s: 13.347942659561435 loss: 0.2746 (0.3688) time: 0.3992 data: 0.0017 max mem: 7365
2021-05-27 03:13:08,144 INFO torchdistill.misc.log Epoch: [2] [200/312] eta: 0:00:45 lr: 2.371794871794872e-06 sample/s: 9.294048107726917 loss: 0.2737 (0.3552) time: 0.4062 data: 0.0017 max mem: 7365
2021-05-27 03:13:29,127 INFO torchdistill.misc.log Epoch: [2] [250/312] eta: 0:00:25 lr: 1.3034188034188036e-06 sample/s: 9.276037203192194 loss: 0.2493 (0.3468) time: 0.4134 data: 0.0017 max mem: 7365
2021-05-27 03:13:49,840 INFO torchdistill.misc.log Epoch: [2] [300/312] eta: 0:00:04 lr: 2.3504273504273505e-07 sample/s: 9.257023235231964 loss: 0.1451 (0.3412) time: 0.4061 data: 0.0018 max mem: 7365
2021-05-27 03:13:54,183 INFO torchdistill.misc.log Epoch: [2] Total time: 0:02:08
2021-05-27 03:13:58,014 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:13:58,014 INFO __main__ Validation: accuracy = 0.7220216606498195
2021-05-27 03:14:04,515 INFO __main__ [Student: bert-large-uncased]
2021-05-27 03:14:08,368 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:14:08,368 INFO __main__ Test: accuracy = 0.740072202166065
2021-05-27 03:14:08,368 INFO __main__ Start prediction for private dataset(s)
2021-05-27 03:14:08,369 INFO __main__ rte/test: 3000 samples