2021-05-27 23:26:01,880 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/mse/bert_base_uncased.yaml', log='log/glue/stsb/mse/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1) 2021-05-27 23:26:01,909 INFO __main__ Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Use FP16 precision: True 2021-05-27 23:26:07,015 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/stsb/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad) 2021-05-27 23:26:08,583 INFO __main__ Start training 2021-05-27 23:26:08,583 INFO torchdistill.models.util [student model] 2021-05-27 23:26:08,583 INFO torchdistill.models.util Using the original student model 2021-05-27 23:26:08,583 INFO torchdistill.core.training Loss = 1.0 * OrgLoss 2021-05-27 23:26:11,276 INFO torchdistill.misc.log Epoch: [0] [ 0/180] eta: 0:00:49 lr: 4.9907407407407406e-05 sample/s: 14.701703069204987 loss: 15.5303 (15.5303) time: 0.2778 data: 0.0057 max mem: 2057 2021-05-27 23:26:23,734 INFO torchdistill.misc.log Epoch: [0] [ 50/180] eta: 0:00:32 lr: 4.527777777777778e-05 sample/s: 14.820613434933513 loss: 1.4726 (5.8169) time: 0.2571 data: 0.0034 max mem: 3684 2021-05-27 23:26:36,370 INFO torchdistill.misc.log Epoch: [0] [100/180] eta: 0:00:20 lr: 4.064814814814815e-05 sample/s: 14.914340322071642 loss: 0.6186 (3.3036) time: 0.2533 data: 0.0034 max mem: 3887 2021-05-27 23:26:48,629 INFO torchdistill.misc.log Epoch: [0] [150/180] eta: 0:00:07 lr: 3.601851851851852e-05 sample/s: 18.040405170003655 loss: 0.5369 (2.4056) time: 0.2418 data: 0.0034 max mem: 4447 2021-05-27 23:26:55,955 INFO torchdistill.misc.log Epoch: [0] Total time: 0:00:44 2021-05-27 23:26:58,859 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow 2021-05-27 23:26:58,859 INFO __main__ Validation: pearson = 0.881096812440957, spearmanr = 0.877532796405759 2021-05-27 23:26:58,860 INFO __main__ Updating ckpt at ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased 2021-05-27 23:27:00,150 INFO torchdistill.misc.log Epoch: [1] [ 0/180] eta: 0:00:52 lr: 3.3240740740740746e-05 sample/s: 13.832772125389575 loss: 0.4722 (0.4722) time: 0.2935 data: 0.0043 max mem: 4447 2021-05-27 23:27:12,625 INFO torchdistill.misc.log Epoch: [1] [ 50/180] eta: 0:00:32 lr: 2.861111111111111e-05 sample/s: 16.265235191865646 loss: 0.3779 (0.3652) time: 0.2371 data: 0.0035 max mem: 4447 2021-05-27 23:27:24,974 INFO torchdistill.misc.log Epoch: [1] [100/180] eta: 0:00:19 lr: 2.398148148148148e-05 sample/s: 20.118664656864684 loss: 0.3085 (0.3523) time: 0.2422 data: 0.0035 max mem: 4447 2021-05-27 23:27:37,643 INFO torchdistill.misc.log Epoch: [1] [150/180] eta: 0:00:07 lr: 1.9351851851851853e-05 sample/s: 13.83483675399673 loss: 0.3442 (0.3544) time: 0.2645 data: 0.0035 max mem: 4448 2021-05-27 23:27:44,984 INFO torchdistill.misc.log Epoch: [1] Total time: 0:00:45 2021-05-27 23:27:47,885 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow 2021-05-27 23:27:47,885 INFO __main__ Validation: pearson = 0.8871473366574754, spearmanr = 0.8842887289343818 2021-05-27 23:27:47,885 INFO __main__ Updating ckpt at ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased 2021-05-27 23:27:49,860 INFO torchdistill.misc.log Epoch: [2] [ 0/180] eta: 0:00:45 lr: 1.6574074074074075e-05 sample/s: 16.07903226115608 loss: 0.1846 (0.1846) time: 0.2531 data: 0.0043 max mem: 4448 2021-05-27 23:28:01,803 INFO torchdistill.misc.log Epoch: [2] [ 50/180] eta: 0:00:31 lr: 1.1944444444444446e-05 sample/s: 16.289887718998443 loss: 0.1593 (0.1702) time: 0.2374 data: 0.0033 max mem: 4448 2021-05-27 23:28:14,559 INFO torchdistill.misc.log Epoch: [2] [100/180] eta: 0:00:19 lr: 7.314814814814815e-06 sample/s: 11.970558100375301 loss: 0.1566 (0.1677) time: 0.2591 data: 0.0035 max mem: 4448 2021-05-27 23:28:27,138 INFO torchdistill.misc.log Epoch: [2] [150/180] eta: 0:00:07 lr: 2.685185185185185e-06 sample/s: 16.26870508515336 loss: 0.1590 (0.1644) time: 0.2500 data: 0.0036 max mem: 4448 2021-05-27 23:28:34,498 INFO torchdistill.misc.log Epoch: [2] Total time: 0:00:44 2021-05-27 23:28:37,405 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow 2021-05-27 23:28:37,406 INFO __main__ Validation: pearson = 0.8888187382563609, spearmanr = 0.885554195504554 2021-05-27 23:28:37,406 INFO __main__ Updating ckpt at ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased 2021-05-27 23:28:42,235 INFO __main__ [Student: bert-base-uncased] 2021-05-27 23:28:45,145 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow 2021-05-27 23:28:45,145 INFO __main__ Test: pearson = 0.8888187382563609, spearmanr = 0.885554195504554 2021-05-27 23:28:45,145 INFO __main__ Start prediction for private dataset(s) 2021-05-27 23:28:45,146 INFO __main__ stsb/test: 1379 samples