2021-05-28 21:41:37,358 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml', log='log/glue/qqp/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1) 2021-05-28 21:41:37,386 INFO __main__ Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Use FP16 precision: True 2021-05-28 21:41:42,076 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/qqp/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad) 2021-05-28 21:42:41,913 INFO __main__ Start training 2021-05-28 21:42:41,913 INFO torchdistill.models.util [student model] 2021-05-28 21:42:41,914 INFO torchdistill.models.util Using the original student model 2021-05-28 21:42:41,914 INFO torchdistill.core.training Loss = 1.0 * OrgLoss 2021-05-28 21:42:44,608 INFO torchdistill.misc.log Epoch: [0] [ 0/22741] eta: 1:02:12 lr: 4.999926710933263e-05 sample/s: 28.09482152306569 loss: 0.5684 (0.5684) time: 0.1641 data: 0.0218 max mem: 1891 2021-05-28 21:45:02,079 INFO torchdistill.misc.log Epoch: [0] [ 1000/22741] eta: 0:49:49 lr: 4.926637644196239e-05 sample/s: 22.046275952693826 loss: 0.3256 (0.4391) time: 0.1462 data: 0.0020 max mem: 3206 2021-05-28 21:47:18,645 INFO torchdistill.misc.log Epoch: [0] [ 2000/22741] eta: 0:47:22 lr: 4.8533485774592146e-05 sample/s: 23.55741774597576 loss: 0.3501 (0.3987) time: 0.1345 data: 0.0020 max mem: 3206 2021-05-28 21:49:34,213 INFO torchdistill.misc.log Epoch: [0] [ 3000/22741] eta: 0:44:55 lr: 4.780059510722191e-05 sample/s: 32.71120261889025 loss: 0.2823 (0.3815) time: 0.1361 data: 0.0019 max mem: 3206 2021-05-28 21:51:49,521 INFO torchdistill.misc.log Epoch: [0] [ 4000/22741] eta: 0:42:33 lr: 4.706770443985167e-05 sample/s: 25.242523407373305 loss: 0.3737 (0.3665) time: 0.1319 data: 0.0019 max mem: 3206 2021-05-28 21:54:05,811 INFO torchdistill.misc.log Epoch: [0] [ 5000/22741] eta: 0:40:17 lr: 4.633481377248142e-05 sample/s: 32.662548451970494 loss: 0.2040 (0.3526) time: 0.1379 data: 0.0021 max mem: 3206 2021-05-28 21:56:21,633 INFO torchdistill.misc.log Epoch: [0] [ 6000/22741] eta: 0:37:59 lr: 4.560192310511118e-05 sample/s: 17.64687147306407 loss: 0.2610 (0.3427) time: 0.1330 data: 0.0019 max mem: 3206 2021-05-28 21:58:37,550 INFO torchdistill.misc.log Epoch: [0] [ 7000/22741] eta: 0:35:42 lr: 4.4869032437740936e-05 sample/s: 39.795854661726544 loss: 0.2279 (0.3345) time: 0.1368 data: 0.0019 max mem: 3206 2021-05-28 22:00:54,589 INFO torchdistill.misc.log Epoch: [0] [ 8000/22741] eta: 0:33:28 lr: 4.41361417703707e-05 sample/s: 29.80756185917765 loss: 0.3037 (0.3276) time: 0.1373 data: 0.0020 max mem: 3206 2021-05-28 22:03:09,969 INFO torchdistill.misc.log Epoch: [0] [ 9000/22741] eta: 0:31:10 lr: 4.340325110300046e-05 sample/s: 31.53214914663003 loss: 0.1971 (0.3208) time: 0.1426 data: 0.0020 max mem: 3206 2021-05-28 22:05:26,598 INFO torchdistill.misc.log Epoch: [0] [10000/22741] eta: 0:28:55 lr: 4.267036043563021e-05 sample/s: 32.66242127498029 loss: 0.2713 (0.3154) time: 0.1431 data: 0.0019 max mem: 3206 2021-05-28 22:07:42,509 INFO torchdistill.misc.log Epoch: [0] [11000/22741] eta: 0:26:38 lr: 4.193746976825997e-05 sample/s: 39.87350538666844 loss: 0.2716 (0.3113) time: 0.1377 data: 0.0020 max mem: 3206 2021-05-28 22:09:59,798 INFO torchdistill.misc.log Epoch: [0] [12000/22741] eta: 0:24:23 lr: 4.1204579100889726e-05 sample/s: 29.81990622411654 loss: 0.1884 (0.3089) time: 0.1430 data: 0.0020 max mem: 3206 2021-05-28 22:12:15,437 INFO torchdistill.misc.log Epoch: [0] [13000/22741] eta: 0:22:06 lr: 4.047168843351949e-05 sample/s: 27.23596133085496 loss: 0.2587 (0.3061) time: 0.1314 data: 0.0020 max mem: 3206 2021-05-28 22:14:30,715 INFO torchdistill.misc.log Epoch: [0] [14000/22741] eta: 0:19:50 lr: 3.973879776614925e-05 sample/s: 27.213695399343713 loss: 0.2133 (0.3033) time: 0.1350 data: 0.0019 max mem: 3206 2021-05-28 22:16:46,940 INFO torchdistill.misc.log Epoch: [0] [15000/22741] eta: 0:17:33 lr: 3.9005907098779e-05 sample/s: 32.61448261114675 loss: 0.1779 (0.3003) time: 0.1352 data: 0.0019 max mem: 3206 2021-05-28 22:19:01,833 INFO torchdistill.misc.log Epoch: [0] [16000/22741] eta: 0:15:17 lr: 3.827301643140876e-05 sample/s: 29.83268519160634 loss: 0.2660 (0.2970) time: 0.1310 data: 0.0020 max mem: 3206 2021-05-28 22:21:16,379 INFO torchdistill.misc.log Epoch: [0] [17000/22741] eta: 0:13:00 lr: 3.754012576403852e-05 sample/s: 32.618984755190645 loss: 0.2141 (0.2938) time: 0.1298 data: 0.0019 max mem: 3206 2021-05-28 22:23:31,336 INFO torchdistill.misc.log Epoch: [0] [18000/22741] eta: 0:10:44 lr: 3.680723509666828e-05 sample/s: 26.997843676178093 loss: 0.2866 (0.2914) time: 0.1386 data: 0.0020 max mem: 3206 2021-05-28 22:25:46,116 INFO torchdistill.misc.log Epoch: [0] [19000/22741] eta: 0:08:28 lr: 3.607434442929804e-05 sample/s: 44.86558415163768 loss: 0.2532 (0.2893) time: 0.1327 data: 0.0020 max mem: 3206 2021-05-28 22:28:02,199 INFO torchdistill.misc.log Epoch: [0] [20000/22741] eta: 0:06:12 lr: 3.53414537619278e-05 sample/s: 23.53072124231408 loss: 0.1963 (0.2868) time: 0.1398 data: 0.0019 max mem: 3206 2021-05-28 22:30:17,943 INFO torchdistill.misc.log Epoch: [0] [21000/22741] eta: 0:03:56 lr: 3.460856309455756e-05 sample/s: 29.843086123508265 loss: 0.1552 (0.2842) time: 0.1418 data: 0.0019 max mem: 3206 2021-05-28 22:32:32,428 INFO torchdistill.misc.log Epoch: [0] [22000/22741] eta: 0:01:40 lr: 3.387567242718731e-05 sample/s: 29.88486894254491 loss: 0.1958 (0.2824) time: 0.1364 data: 0.0019 max mem: 3206 2021-05-28 22:34:12,553 INFO torchdistill.misc.log Epoch: [0] Total time: 0:51:28 2021-05-28 22:35:52,473 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow 2021-05-28 22:35:52,475 INFO __main__ Validation: accuracy = 0.9028938906752412, f1 = 0.8703006276841757 2021-05-28 22:35:52,475 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased 2021-05-28 22:35:53,740 INFO torchdistill.misc.log Epoch: [1] [ 0/22741] eta: 1:01:53 lr: 3.3332600442665964e-05 sample/s: 28.324724767312098 loss: 0.5533 (0.5533) time: 0.1633 data: 0.0221 max mem: 3206 2021-05-28 22:38:09,983 INFO torchdistill.misc.log Epoch: [1] [ 1000/22741] eta: 0:49:22 lr: 3.2599709775295725e-05 sample/s: 39.881372451138404 loss: 0.0934 (0.1804) time: 0.1297 data: 0.0019 max mem: 3206 2021-05-28 22:40:25,602 INFO torchdistill.misc.log Epoch: [1] [ 2000/22741] eta: 0:46:59 lr: 3.1866819107925486e-05 sample/s: 39.8993930861285 loss: 0.1000 (0.1880) time: 0.1340 data: 0.0019 max mem: 3206 2021-05-28 22:42:40,851 INFO torchdistill.misc.log Epoch: [1] [ 3000/22741] eta: 0:44:39 lr: 3.113392844055524e-05 sample/s: 39.91201722353725 loss: 0.0397 (0.1897) time: 0.1337 data: 0.0020 max mem: 3206 2021-05-28 22:44:56,820 INFO torchdistill.misc.log Epoch: [1] [ 4000/22741] eta: 0:42:24 lr: 3.0401037773184997e-05 sample/s: 35.83940938473304 loss: 0.2073 (0.1937) time: 0.1387 data: 0.0020 max mem: 3206 2021-05-28 22:47:12,892 INFO torchdistill.misc.log Epoch: [1] [ 5000/22741] eta: 0:40:09 lr: 2.9668147105814757e-05 sample/s: 23.52620072946129 loss: 0.2317 (0.1947) time: 0.1377 data: 0.0020 max mem: 3206 2021-05-28 22:49:29,022 INFO torchdistill.misc.log Epoch: [1] [ 6000/22741] eta: 0:37:54 lr: 2.8935256438444515e-05 sample/s: 35.71216075267673 loss: 0.1682 (0.1950) time: 0.1327 data: 0.0020 max mem: 3206 2021-05-28 22:51:44,757 INFO torchdistill.misc.log Epoch: [1] [ 7000/22741] eta: 0:35:38 lr: 2.8202365771074275e-05 sample/s: 35.8185971639261 loss: 0.1936 (0.1957) time: 0.1277 data: 0.0020 max mem: 3206 2021-05-28 22:54:00,779 INFO torchdistill.misc.log Epoch: [1] [ 8000/22741] eta: 0:33:23 lr: 2.746947510370403e-05 sample/s: 35.78253914764559 loss: 0.1607 (0.2021) time: 0.1355 data: 0.0020 max mem: 3206 2021-05-28 22:56:16,445 INFO torchdistill.misc.log Epoch: [1] [ 9000/22741] eta: 0:31:06 lr: 2.673658443633379e-05 sample/s: 29.859710821758846 loss: 0.0592 (0.2015) time: 0.1354 data: 0.0020 max mem: 3206 2021-05-28 22:58:32,285 INFO torchdistill.misc.log Epoch: [1] [10000/22741] eta: 0:28:50 lr: 2.600369376896355e-05 sample/s: 23.53702646320642 loss: 0.0289 (0.2013) time: 0.1486 data: 0.0020 max mem: 3206 2021-05-28 23:00:47,277 INFO torchdistill.misc.log Epoch: [1] [11000/22741] eta: 0:26:34 lr: 2.5270803101593305e-05 sample/s: 32.66890856445197 loss: 0.0945 (0.2022) time: 0.1271 data: 0.0019 max mem: 3206 2021-05-28 23:03:03,760 INFO torchdistill.misc.log Epoch: [1] [12000/22741] eta: 0:24:19 lr: 2.4537912434223065e-05 sample/s: 32.55891074505907 loss: 0.1768 (0.2016) time: 0.1379 data: 0.0020 max mem: 3206 2021-05-28 23:05:18,319 INFO torchdistill.misc.log Epoch: [1] [13000/22741] eta: 0:22:02 lr: 2.3805021766852823e-05 sample/s: 32.3476698293464 loss: 0.0365 (0.2002) time: 0.1354 data: 0.0019 max mem: 3206 2021-05-28 23:07:33,589 INFO torchdistill.misc.log Epoch: [1] [14000/22741] eta: 0:19:46 lr: 2.307213109948258e-05 sample/s: 29.607043339692197 loss: 0.0187 (0.2008) time: 0.1379 data: 0.0019 max mem: 3206 2021-05-28 23:09:49,171 INFO torchdistill.misc.log Epoch: [1] [15000/22741] eta: 0:17:30 lr: 2.2339240432112337e-05 sample/s: 32.65828856186249 loss: 0.3250 (0.2023) time: 0.1350 data: 0.0019 max mem: 3206 2021-05-28 23:12:04,128 INFO torchdistill.misc.log Epoch: [1] [16000/22741] eta: 0:15:14 lr: 2.1606349764742098e-05 sample/s: 39.902145036733664 loss: 0.1157 (0.2023) time: 0.1332 data: 0.0019 max mem: 3206 2021-05-28 23:14:19,963 INFO torchdistill.misc.log Epoch: [1] [17000/22741] eta: 0:12:58 lr: 2.0873459097371855e-05 sample/s: 29.81842224062732 loss: 0.0740 (0.2035) time: 0.1371 data: 0.0019 max mem: 3206 2021-05-28 23:16:35,045 INFO torchdistill.misc.log Epoch: [1] [18000/22741] eta: 0:10:43 lr: 2.0140568430001613e-05 sample/s: 27.17503757839631 loss: 0.1538 (0.2039) time: 0.1394 data: 0.0022 max mem: 3206 2021-05-28 23:18:51,761 INFO torchdistill.misc.log Epoch: [1] [19000/22741] eta: 0:08:27 lr: 1.940767776263137e-05 sample/s: 29.899515255203877 loss: 0.2649 (0.2050) time: 0.1399 data: 0.0019 max mem: 3206 2021-05-28 23:21:07,148 INFO torchdistill.misc.log Epoch: [1] [20000/22741] eta: 0:06:11 lr: 1.8674787095261127e-05 sample/s: 29.91145595618439 loss: 0.0765 (0.2059) time: 0.1264 data: 0.0019 max mem: 3206 2021-05-28 23:23:23,180 INFO torchdistill.misc.log Epoch: [1] [21000/22741] eta: 0:03:56 lr: 1.7941896427890888e-05 sample/s: 35.81056949961473 loss: 0.0140 (0.2059) time: 0.1309 data: 0.0020 max mem: 3206 2021-05-28 23:25:39,127 INFO torchdistill.misc.log Epoch: [1] [22000/22741] eta: 0:01:40 lr: 1.7209005760520645e-05 sample/s: 29.85492866014898 loss: 0.1697 (0.2061) time: 0.1355 data: 0.0020 max mem: 3206 2021-05-28 23:27:19,524 INFO torchdistill.misc.log Epoch: [1] Total time: 0:51:25 2021-05-28 23:28:59,456 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow 2021-05-28 23:28:59,458 INFO __main__ Validation: accuracy = 0.9066782092505565, f1 = 0.8765904556307854 2021-05-28 23:28:59,459 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased 2021-05-28 23:29:00,762 INFO torchdistill.misc.log Epoch: [2] [ 0/22741] eta: 0:52:36 lr: 1.66659337759993e-05 sample/s: 35.41260205503162 loss: 0.0020 (0.0020) time: 0.1388 data: 0.0258 max mem: 3206 2021-05-28 23:31:16,699 INFO torchdistill.misc.log Epoch: [2] [ 1000/22741] eta: 0:49:15 lr: 1.5933043108629057e-05 sample/s: 35.72493622530461 loss: 0.0000 (0.2282) time: 0.1340 data: 0.0020 max mem: 3206 2021-05-28 23:33:33,394 INFO torchdistill.misc.log Epoch: [2] [ 2000/22741] eta: 0:47:07 lr: 1.5200152441258813e-05 sample/s: 30.27454801694068 loss: 0.0000 (0.2493) time: 0.1335 data: 0.0019 max mem: 3206 2021-05-28 23:35:48,660 INFO torchdistill.misc.log Epoch: [2] [ 3000/22741] eta: 0:44:44 lr: 1.4467261773888572e-05 sample/s: 36.234792036525896 loss: 0.0000 (0.2656) time: 0.1393 data: 0.0019 max mem: 3206 2021-05-28 23:38:03,495 INFO torchdistill.misc.log Epoch: [2] [ 4000/22741] eta: 0:42:22 lr: 1.373437110651833e-05 sample/s: 27.239410504986875 loss: 0.0000 (0.2649) time: 0.1332 data: 0.0019 max mem: 3206 2021-05-28 23:40:18,370 INFO torchdistill.misc.log Epoch: [2] [ 5000/22741] eta: 0:40:04 lr: 1.300148043914809e-05 sample/s: 27.191730267732257 loss: 0.0001 (0.2653) time: 0.1463 data: 0.0020 max mem: 3206 2021-05-28 23:42:33,360 INFO torchdistill.misc.log Epoch: [2] [ 6000/22741] eta: 0:37:47 lr: 1.2268589771777847e-05 sample/s: 32.66865411239648 loss: 0.0000 (0.2644) time: 0.1408 data: 0.0020 max mem: 3206 2021-05-28 23:44:47,853 INFO torchdistill.misc.log Epoch: [2] [ 7000/22741] eta: 0:35:29 lr: 1.1535699104407605e-05 sample/s: 33.143780558875534 loss: 0.0001 (0.2670) time: 0.1333 data: 0.0020 max mem: 3206 2021-05-28 23:47:02,473 INFO torchdistill.misc.log Epoch: [2] [ 8000/22741] eta: 0:33:13 lr: 1.0802808437037364e-05 sample/s: 32.695839261005986 loss: 0.0000 (0.2660) time: 0.1307 data: 0.0019 max mem: 3206 2021-05-28 23:49:15,855 INFO torchdistill.misc.log Epoch: [2] [ 9000/22741] eta: 0:30:55 lr: 1.0069917769667121e-05 sample/s: 29.891577835226958 loss: 0.0000 (0.2636) time: 0.1350 data: 0.0020 max mem: 3206 2021-05-28 23:51:30,840 INFO torchdistill.misc.log Epoch: [2] [10000/22741] eta: 0:28:40 lr: 9.33702710229688e-06 sample/s: 35.751504987501946 loss: 0.0000 (0.2629) time: 0.1275 data: 0.0021 max mem: 3206 2021-05-28 23:53:45,522 INFO torchdistill.misc.log Epoch: [2] [11000/22741] eta: 0:26:24 lr: 8.604136434926637e-06 sample/s: 29.622255337481374 loss: 0.0000 (0.2632) time: 0.1303 data: 0.0020 max mem: 3206 2021-05-28 23:55:59,507 INFO torchdistill.misc.log Epoch: [2] [12000/22741] eta: 0:24:08 lr: 7.871245767556396e-06 sample/s: 35.754933635673915 loss: 0.0000 (0.2629) time: 0.1290 data: 0.0019 max mem: 3206 2021-05-28 23:58:15,335 INFO torchdistill.misc.log Epoch: [2] [13000/22741] eta: 0:21:54 lr: 7.1383551001861544e-06 sample/s: 35.821579784565124 loss: 0.0058 (0.2626) time: 0.1291 data: 0.0020 max mem: 3206 2021-05-29 00:00:30,646 INFO torchdistill.misc.log Epoch: [2] [14000/22741] eta: 0:19:39 lr: 6.405464432815913e-06 sample/s: 33.14227467217681 loss: 0.0000 (0.2610) time: 0.1348 data: 0.0019 max mem: 3206 2021-05-29 00:02:43,835 INFO torchdistill.misc.log Epoch: [2] [15000/22741] eta: 0:17:24 lr: 5.672573765445672e-06 sample/s: 27.24148927533408 loss: 0.0000 (0.2609) time: 0.1321 data: 0.0019 max mem: 3206 2021-05-29 00:04:58,889 INFO torchdistill.misc.log Epoch: [2] [16000/22741] eta: 0:15:09 lr: 4.93968309807543e-06 sample/s: 35.784523504820406 loss: 0.0000 (0.2591) time: 0.1300 data: 0.0020 max mem: 3206 2021-05-29 00:07:13,971 INFO torchdistill.misc.log Epoch: [2] [17000/22741] eta: 0:12:54 lr: 4.206792430705188e-06 sample/s: 29.886306308874037 loss: 0.0000 (0.2589) time: 0.1387 data: 0.0019 max mem: 3206 2021-05-29 00:09:30,372 INFO torchdistill.misc.log Epoch: [2] [18000/22741] eta: 0:10:39 lr: 3.473901763334946e-06 sample/s: 32.35147476243367 loss: 0.0000 (0.2577) time: 0.1410 data: 0.0019 max mem: 3206 2021-05-29 00:11:47,544 INFO torchdistill.misc.log Epoch: [2] [19000/22741] eta: 0:08:25 lr: 2.7410110959647043e-06 sample/s: 25.485669147804952 loss: 0.0000 (0.2556) time: 0.1318 data: 0.0019 max mem: 3206 2021-05-29 00:14:02,842 INFO torchdistill.misc.log Epoch: [2] [20000/22741] eta: 0:06:10 lr: 2.0081204285944624e-06 sample/s: 35.68428418592087 loss: 0.0000 (0.2549) time: 0.1366 data: 0.0020 max mem: 3206 2021-05-29 00:16:16,820 INFO torchdistill.misc.log Epoch: [2] [21000/22741] eta: 0:03:55 lr: 1.2752297612242206e-06 sample/s: 30.313933171800834 loss: 0.0000 (0.2551) time: 0.1303 data: 0.0019 max mem: 3206 2021-05-29 00:18:32,152 INFO torchdistill.misc.log Epoch: [2] [22000/22741] eta: 0:01:40 lr: 5.423390938539788e-07 sample/s: 27.488987015114898 loss: 0.0000 (0.2537) time: 0.1327 data: 0.0019 max mem: 3206 2021-05-29 00:20:12,904 INFO torchdistill.misc.log Epoch: [2] Total time: 0:51:12 2021-05-29 00:21:52,921 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow 2021-05-29 00:21:52,922 INFO __main__ Validation: accuracy = 0.9093742270591145, f1 = 0.8781833898530488 2021-05-29 00:21:52,923 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased 2021-05-29 00:21:57,493 INFO __main__ [Student: bert-base-uncased] 2021-05-29 00:23:37,517 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow 2021-05-29 00:23:37,519 INFO __main__ Test: accuracy = 0.9093742270591145, f1 = 0.8781833898530488 2021-05-29 00:23:37,519 INFO __main__ Start prediction for private dataset(s) 2021-05-29 00:23:37,520 INFO __main__ qqp/test: 390965 samples