2021-05-31 19:12:19,502 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1) 2021-05-31 19:12:19,563 INFO __main__ Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Use FP16 precision: True 2021-05-31 19:12:19,941 INFO filelock Lock 140082792337040 acquired on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock 2021-05-31 19:12:20,295 INFO filelock Lock 140082792337040 released on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock 2021-05-31 19:12:21,006 INFO filelock Lock 140082831894224 acquired on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock 2021-05-31 19:12:21,516 INFO filelock Lock 140082831894224 released on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock 2021-05-31 19:12:21,871 INFO filelock Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock 2021-05-31 19:12:22,393 INFO filelock Lock 140082823814352 released on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock 2021-05-31 19:12:23,095 INFO filelock Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock 2021-05-31 19:12:23,448 INFO filelock Lock 140082823814352 released on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock 2021-05-31 19:12:23,803 INFO filelock Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock 2021-05-31 19:12:24,158 INFO filelock Lock 140082823814352 released on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock 2021-05-31 19:12:24,537 INFO filelock Lock 140082823814992 acquired on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock 2021-05-31 19:13:00,303 INFO filelock Lock 140082823814992 released on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock 2021-05-31 19:14:53,610 INFO __main__ Start training 2021-05-31 19:14:53,610 INFO torchdistill.models.util [teacher model] 2021-05-31 19:14:53,610 INFO torchdistill.models.util Using the original teacher model 2021-05-31 19:14:53,610 INFO torchdistill.models.util [student model] 2021-05-31 19:14:53,611 INFO torchdistill.models.util Using the original student model 2021-05-31 19:14:53,611 INFO torchdistill.core.distillation Loss = 1.0 * OrgLoss 2021-05-31 19:14:53,611 INFO torchdistill.core.distillation Freezing the whole teacher model 2021-05-31 19:14:58,197 INFO torchdistill.misc.log Epoch: [0] [ 0/12272] eta: 0:26:53 lr: 9.999728378965668e-05 sample/s: 38.52969437529281 loss: 0.0905 (0.0905) time: 0.1315 data: 0.0277 max mem: 2519 2021-05-31 19:17:04,033 INFO torchdistill.misc.log Epoch: [0] [ 1000/12272] eta: 0:23:38 lr: 9.728107344632768e-05 sample/s: 25.678521357422294 loss: 0.0229 (0.0347) time: 0.1315 data: 0.0046 max mem: 5109 2021-05-31 19:19:10,890 INFO torchdistill.misc.log Epoch: [0] [ 2000/12272] eta: 0:21:37 lr: 9.45648631029987e-05 sample/s: 33.98564182345601 loss: 0.0153 (0.0267) time: 0.1355 data: 0.0044 max mem: 5109 2021-05-31 19:21:17,630 INFO torchdistill.misc.log Epoch: [0] [ 3000/12272] eta: 0:19:32 lr: 9.184865275966971e-05 sample/s: 30.293297895006734 loss: 0.0145 (0.0230) time: 0.1215 data: 0.0044 max mem: 5109 2021-05-31 19:23:24,094 INFO torchdistill.misc.log Epoch: [0] [ 4000/12272] eta: 0:17:26 lr: 8.913244241634072e-05 sample/s: 39.35939116542367 loss: 0.0144 (0.0208) time: 0.1229 data: 0.0045 max mem: 5109 2021-05-31 19:25:30,963 INFO torchdistill.misc.log Epoch: [0] [ 5000/12272] eta: 0:15:20 lr: 8.641623207301173e-05 sample/s: 31.891785647075373 loss: 0.0108 (0.0192) time: 0.1368 data: 0.0047 max mem: 5109 2021-05-31 19:27:37,490 INFO torchdistill.misc.log Epoch: [0] [ 6000/12272] eta: 0:13:13 lr: 8.370002172968275e-05 sample/s: 30.313604538761055 loss: 0.0109 (0.0179) time: 0.1267 data: 0.0047 max mem: 5109 2021-05-31 19:29:45,181 INFO torchdistill.misc.log Epoch: [0] [ 7000/12272] eta: 0:11:08 lr: 8.098381138635376e-05 sample/s: 42.336344641721595 loss: 0.0095 (0.0170) time: 0.1268 data: 0.0045 max mem: 5109 2021-05-31 19:31:52,182 INFO torchdistill.misc.log Epoch: [0] [ 8000/12272] eta: 0:09:01 lr: 7.826760104302477e-05 sample/s: 31.78104944118204 loss: 0.0112 (0.0162) time: 0.1264 data: 0.0046 max mem: 5109 2021-05-31 19:33:59,788 INFO torchdistill.misc.log Epoch: [0] [ 9000/12272] eta: 0:06:55 lr: 7.555139069969579e-05 sample/s: 30.615916348838482 loss: 0.0089 (0.0155) time: 0.1314 data: 0.0045 max mem: 5109 2021-05-31 19:36:07,595 INFO torchdistill.misc.log Epoch: [0] [10000/12272] eta: 0:04:48 lr: 7.283518035636681e-05 sample/s: 37.10492838754766 loss: 0.0072 (0.0149) time: 0.1298 data: 0.0048 max mem: 5109 2021-05-31 19:38:13,949 INFO torchdistill.misc.log Epoch: [0] [11000/12272] eta: 0:02:41 lr: 7.011897001303781e-05 sample/s: 32.78477658489305 loss: 0.0090 (0.0144) time: 0.1288 data: 0.0045 max mem: 5109 2021-05-31 19:40:21,535 INFO torchdistill.misc.log Epoch: [0] [12000/12272] eta: 0:00:34 lr: 6.740275966970883e-05 sample/s: 37.34245014245014 loss: 0.0079 (0.0140) time: 0.1317 data: 0.0050 max mem: 5109 2021-05-31 19:40:56,676 INFO torchdistill.misc.log Epoch: [0] Total time: 0:25:58 2021-05-31 19:41:04,501 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow 2021-05-31 19:41:04,501 INFO __main__ Validation: accuracy = 0.8412633723892002 2021-05-31 19:41:04,501 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased 2021-05-31 19:41:05,722 INFO torchdistill.misc.log Epoch: [1] [ 0/12272] eta: 0:31:19 lr: 6.666395045632334e-05 sample/s: 31.762036742544716 loss: 0.0031 (0.0031) time: 0.1532 data: 0.0272 max mem: 5109 2021-05-31 19:43:13,358 INFO torchdistill.misc.log Epoch: [1] [ 1000/12272] eta: 0:23:58 lr: 6.394774011299436e-05 sample/s: 37.225953324487556 loss: 0.0044 (0.0051) time: 0.1300 data: 0.0046 max mem: 5109 2021-05-31 19:45:20,181 INFO torchdistill.misc.log Epoch: [1] [ 2000/12272] eta: 0:21:47 lr: 6.123152976966536e-05 sample/s: 37.15834562552879 loss: 0.0047 (0.0051) time: 0.1284 data: 0.0045 max mem: 5109 2021-05-31 19:47:26,919 INFO torchdistill.misc.log Epoch: [1] [ 3000/12272] eta: 0:19:38 lr: 5.851531942633638e-05 sample/s: 39.3462836451305 loss: 0.0042 (0.0050) time: 0.1197 data: 0.0043 max mem: 5109 2021-05-31 19:49:32,833 INFO torchdistill.misc.log Epoch: [1] [ 4000/12272] eta: 0:17:28 lr: 5.5799109083007396e-05 sample/s: 33.65857162059412 loss: 0.0040 (0.0050) time: 0.1264 data: 0.0043 max mem: 5109 2021-05-31 19:51:40,796 INFO torchdistill.misc.log Epoch: [1] [ 5000/12272] eta: 0:15:23 lr: 5.30828987396784e-05 sample/s: 26.070806883959442 loss: 0.0046 (0.0050) time: 0.1288 data: 0.0045 max mem: 5109 2021-05-31 19:53:48,528 INFO torchdistill.misc.log Epoch: [1] [ 6000/12272] eta: 0:13:17 lr: 5.036668839634942e-05 sample/s: 32.30201815220279 loss: 0.0045 (0.0049) time: 0.1212 data: 0.0044 max mem: 5109 2021-05-31 19:55:53,950 INFO torchdistill.misc.log Epoch: [1] [ 7000/12272] eta: 0:11:08 lr: 4.765047805302043e-05 sample/s: 36.80166358838471 loss: 0.0038 (0.0049) time: 0.1297 data: 0.0044 max mem: 5109 2021-05-31 19:57:59,848 INFO torchdistill.misc.log Epoch: [1] [ 8000/12272] eta: 0:09:01 lr: 4.493426770969144e-05 sample/s: 33.594812965184154 loss: 0.0041 (0.0048) time: 0.1258 data: 0.0044 max mem: 5109 2021-05-31 20:00:06,135 INFO torchdistill.misc.log Epoch: [1] [ 9000/12272] eta: 0:06:54 lr: 4.221805736636245e-05 sample/s: 25.64719622596514 loss: 0.0045 (0.0048) time: 0.1241 data: 0.0046 max mem: 5109 2021-05-31 20:02:14,011 INFO torchdistill.misc.log Epoch: [1] [10000/12272] eta: 0:04:48 lr: 3.9501847023033466e-05 sample/s: 33.08476073658346 loss: 0.0043 (0.0048) time: 0.1239 data: 0.0047 max mem: 5109 2021-05-31 20:04:21,426 INFO torchdistill.misc.log Epoch: [1] [11000/12272] eta: 0:02:41 lr: 3.6785636679704476e-05 sample/s: 25.415942288056765 loss: 0.0039 (0.0047) time: 0.1303 data: 0.0045 max mem: 5109 2021-05-31 20:06:28,294 INFO torchdistill.misc.log Epoch: [1] [12000/12272] eta: 0:00:34 lr: 3.406942633637549e-05 sample/s: 37.492242198062506 loss: 0.0038 (0.0047) time: 0.1308 data: 0.0051 max mem: 5109 2021-05-31 20:07:02,616 INFO torchdistill.misc.log Epoch: [1] Total time: 0:25:57 2021-05-31 20:07:10,347 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow 2021-05-31 20:07:10,348 INFO __main__ Validation: accuracy = 0.8530820173204279 2021-05-31 20:07:10,348 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased 2021-05-31 20:07:11,549 INFO torchdistill.misc.log Epoch: [2] [ 0/12272] eta: 0:27:31 lr: 3.3330617122990006e-05 sample/s: 36.21437806918137 loss: 0.0018 (0.0018) time: 0.1346 data: 0.0241 max mem: 5109 2021-05-31 20:09:18,600 INFO torchdistill.misc.log Epoch: [2] [ 1000/12272] eta: 0:23:52 lr: 3.061440677966102e-05 sample/s: 37.12356586986009 loss: 0.0023 (0.0024) time: 0.1341 data: 0.0045 max mem: 5109 2021-05-31 20:11:24,788 INFO torchdistill.misc.log Epoch: [2] [ 2000/12272] eta: 0:21:40 lr: 2.789819643633203e-05 sample/s: 32.69698623302515 loss: 0.0021 (0.0023) time: 0.1271 data: 0.0046 max mem: 5109 2021-05-31 20:13:32,260 INFO torchdistill.misc.log Epoch: [2] [ 3000/12272] eta: 0:19:36 lr: 2.5181986093003048e-05 sample/s: 39.77255238497116 loss: 0.0019 (0.0023) time: 0.1264 data: 0.0047 max mem: 5109 2021-05-31 20:15:38,928 INFO torchdistill.misc.log Epoch: [2] [ 4000/12272] eta: 0:17:29 lr: 2.2465775749674055e-05 sample/s: 37.40997928507879 loss: 0.0019 (0.0023) time: 0.1170 data: 0.0045 max mem: 5109 2021-05-31 20:17:46,151 INFO torchdistill.misc.log Epoch: [2] [ 5000/12272] eta: 0:15:22 lr: 1.974956540634507e-05 sample/s: 39.26708624043028 loss: 0.0018 (0.0023) time: 0.1322 data: 0.0045 max mem: 5109 2021-05-31 20:19:53,077 INFO torchdistill.misc.log Epoch: [2] [ 6000/12272] eta: 0:13:16 lr: 1.7033355063016082e-05 sample/s: 26.89458075644343 loss: 0.0019 (0.0022) time: 0.1324 data: 0.0045 max mem: 5109 2021-05-31 20:21:59,132 INFO torchdistill.misc.log Epoch: [2] [ 7000/12272] eta: 0:11:08 lr: 1.4317144719687093e-05 sample/s: 32.304879269842495 loss: 0.0017 (0.0022) time: 0.1225 data: 0.0044 max mem: 5109 2021-05-31 20:24:05,638 INFO torchdistill.misc.log Epoch: [2] [ 8000/12272] eta: 0:09:01 lr: 1.1600934376358105e-05 sample/s: 39.57945395824831 loss: 0.0021 (0.0022) time: 0.1263 data: 0.0044 max mem: 5109 2021-05-31 20:26:11,594 INFO torchdistill.misc.log Epoch: [2] [ 9000/12272] eta: 0:06:54 lr: 8.884724033029119e-06 sample/s: 25.860594738237488 loss: 0.0023 (0.0022) time: 0.1262 data: 0.0044 max mem: 5109 2021-05-31 20:28:18,549 INFO torchdistill.misc.log Epoch: [2] [10000/12272] eta: 0:04:47 lr: 6.168513689700131e-06 sample/s: 32.40314814635983 loss: 0.0019 (0.0022) time: 0.1260 data: 0.0045 max mem: 5109 2021-05-31 20:30:24,951 INFO torchdistill.misc.log Epoch: [2] [11000/12272] eta: 0:02:41 lr: 3.452303346371143e-06 sample/s: 42.08254363214055 loss: 0.0021 (0.0022) time: 0.1241 data: 0.0044 max mem: 5109 2021-05-31 20:32:31,971 INFO torchdistill.misc.log Epoch: [2] [12000/12272] eta: 0:00:34 lr: 7.360930030421556e-07 sample/s: 33.625651129091416 loss: 0.0019 (0.0022) time: 0.1307 data: 0.0044 max mem: 5109 2021-05-31 20:33:06,083 INFO torchdistill.misc.log Epoch: [2] Total time: 0:25:54 2021-05-31 20:33:13,819 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow 2021-05-31 20:33:13,820 INFO __main__ Validation: accuracy = 0.8582781456953642 2021-05-31 20:33:13,820 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased 2021-05-31 20:33:15,094 INFO __main__ [Teacher: bert-large-uncased] 2021-05-31 20:33:28,908 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow 2021-05-31 20:33:28,908 INFO __main__ Test: accuracy = 0.8665308201732043 2021-05-31 20:33:32,568 INFO __main__ [Student: bert-base-uncased] 2021-05-31 20:33:40,325 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow 2021-05-31 20:33:40,326 INFO __main__ Test: accuracy = 0.8582781456953642 2021-05-31 20:33:40,326 INFO __main__ Start prediction for private dataset(s) 2021-05-31 20:33:40,327 INFO __main__ mnli/test_m: 9796 samples 2021-05-31 20:33:47,980 INFO __main__ mnli/test_mm: 9847 samples 2021-05-31 20:33:55,598 INFO __main__ ax/test_ax: 1104 samples