bert-large-uncased-cola / training.log
yoshitomo-matsubara's picture
tuned hyperparameters
0cdf304
2021-05-25 22:10:47,882 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/ce/bert_large_uncased.yaml', log='log/glue/cola/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)
2021-05-25 22:10:47,924 INFO __main__ Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True
2021-05-25 22:11:19,057 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-25 22:11:21,447 INFO __main__ Start training
2021-05-25 22:11:21,448 INFO torchdistill.models.util [student model]
2021-05-25 22:11:21,448 INFO torchdistill.models.util Using the original student model
2021-05-25 22:11:21,448 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
2021-05-25 22:11:26,160 INFO torchdistill.misc.log Epoch: [0] [ 0/535] eta: 0:03:19 lr: 1.998753894080997e-05 sample/s: 11.577144663106461 loss: 1.2302 (1.2302) time: 0.3729 data: 0.0273 max mem: 2567
2021-05-25 22:11:38,497 INFO torchdistill.misc.log Epoch: [0] [ 50/535] eta: 0:02:00 lr: 1.936448598130841e-05 sample/s: 18.961848487999386 loss: 0.5031 (0.6982) time: 0.2536 data: 0.0022 max mem: 6363
2021-05-25 22:11:51,105 INFO torchdistill.misc.log Epoch: [0] [100/535] eta: 0:01:49 lr: 1.8741433021806853e-05 sample/s: 15.993304194887585 loss: 0.4494 (0.5958) time: 0.2486 data: 0.0021 max mem: 6582
2021-05-25 22:12:03,585 INFO torchdistill.misc.log Epoch: [0] [150/535] eta: 0:01:36 lr: 1.8118380062305295e-05 sample/s: 18.00978787218443 loss: 0.4563 (0.5483) time: 0.2487 data: 0.0022 max mem: 6582
2021-05-25 22:12:16,299 INFO torchdistill.misc.log Epoch: [0] [200/535] eta: 0:01:24 lr: 1.749532710280374e-05 sample/s: 14.060560956947262 loss: 0.4476 (0.5233) time: 0.2532 data: 0.0024 max mem: 6582
2021-05-25 22:12:28,756 INFO torchdistill.misc.log Epoch: [0] [250/535] eta: 0:01:11 lr: 1.6872274143302183e-05 sample/s: 15.967990192999187 loss: 0.4905 (0.5099) time: 0.2481 data: 0.0022 max mem: 6588
2021-05-25 22:12:41,353 INFO torchdistill.misc.log Epoch: [0] [300/535] eta: 0:00:58 lr: 1.6249221183800625e-05 sample/s: 15.929238896579488 loss: 0.3546 (0.4883) time: 0.2483 data: 0.0024 max mem: 6588
2021-05-25 22:12:54,011 INFO torchdistill.misc.log Epoch: [0] [350/535] eta: 0:00:46 lr: 1.5626168224299067e-05 sample/s: 15.80400535051527 loss: 0.3522 (0.4753) time: 0.2495 data: 0.0023 max mem: 6588
2021-05-25 22:13:06,660 INFO torchdistill.misc.log Epoch: [0] [400/535] eta: 0:00:33 lr: 1.500311526479751e-05 sample/s: 15.933353720267208 loss: 0.4134 (0.4657) time: 0.2548 data: 0.0022 max mem: 6588
2021-05-25 22:13:19,462 INFO torchdistill.misc.log Epoch: [0] [450/535] eta: 0:00:21 lr: 1.4380062305295952e-05 sample/s: 14.186444877192478 loss: 0.3114 (0.4558) time: 0.2487 data: 0.0022 max mem: 6789
2021-05-25 22:13:32,122 INFO torchdistill.misc.log Epoch: [0] [500/535] eta: 0:00:08 lr: 1.3757009345794394e-05 sample/s: 15.952928199910618 loss: 0.3921 (0.4485) time: 0.2496 data: 0.0022 max mem: 6789
2021-05-25 22:13:40,793 INFO torchdistill.misc.log Epoch: [0] Total time: 0:02:15
2021-05-25 22:13:44,271 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:13:44,272 INFO __main__ Validation: matthews_correlation = 0.5806473000395166
2021-05-25 22:13:44,272 INFO __main__ Updating ckpt
2021-05-25 22:13:51,970 INFO torchdistill.misc.log Epoch: [1] [ 0/535] eta: 0:02:25 lr: 1.3320872274143304e-05 sample/s: 14.850208406541903 loss: 0.1800 (0.1800) time: 0.2726 data: 0.0032 max mem: 6789
2021-05-25 22:14:04,713 INFO torchdistill.misc.log Epoch: [1] [ 50/535] eta: 0:02:03 lr: 1.2697819314641746e-05 sample/s: 15.958785637372966 loss: 0.1996 (0.2190) time: 0.2530 data: 0.0022 max mem: 6789
2021-05-25 22:14:17,311 INFO torchdistill.misc.log Epoch: [1] [100/535] eta: 0:01:50 lr: 1.2074766355140188e-05 sample/s: 15.915004705096683 loss: 0.2385 (0.2362) time: 0.2554 data: 0.0024 max mem: 6789
2021-05-25 22:14:30,069 INFO torchdistill.misc.log Epoch: [1] [150/535] eta: 0:01:37 lr: 1.145171339563863e-05 sample/s: 15.867590506617637 loss: 0.1191 (0.2252) time: 0.2559 data: 0.0023 max mem: 6789
2021-05-25 22:14:42,525 INFO torchdistill.misc.log Epoch: [1] [200/535] eta: 0:01:24 lr: 1.0828660436137072e-05 sample/s: 15.931447324440123 loss: 0.2200 (0.2283) time: 0.2491 data: 0.0024 max mem: 6789
2021-05-25 22:14:55,329 INFO torchdistill.misc.log Epoch: [1] [250/535] eta: 0:01:12 lr: 1.0205607476635516e-05 sample/s: 17.869222027311004 loss: 0.2449 (0.2366) time: 0.2565 data: 0.0022 max mem: 6789
2021-05-25 22:15:07,884 INFO torchdistill.misc.log Epoch: [1] [300/535] eta: 0:00:59 lr: 9.582554517133958e-06 sample/s: 18.815330197672253 loss: 0.0952 (0.2291) time: 0.2519 data: 0.0023 max mem: 6789
2021-05-25 22:15:20,408 INFO torchdistill.misc.log Epoch: [1] [350/535] eta: 0:00:46 lr: 8.9595015576324e-06 sample/s: 18.9154784279283 loss: 0.1866 (0.2326) time: 0.2561 data: 0.0024 max mem: 6789
2021-05-25 22:15:33,181 INFO torchdistill.misc.log Epoch: [1] [400/535] eta: 0:00:34 lr: 8.336448598130842e-06 sample/s: 15.924173926909392 loss: 0.2051 (0.2323) time: 0.2555 data: 0.0022 max mem: 6793
2021-05-25 22:15:45,996 INFO torchdistill.misc.log Epoch: [1] [450/535] eta: 0:00:21 lr: 7.713395638629284e-06 sample/s: 15.914974510945969 loss: 0.1959 (0.2370) time: 0.2510 data: 0.0023 max mem: 6793
2021-05-25 22:15:58,625 INFO torchdistill.misc.log Epoch: [1] [500/535] eta: 0:00:08 lr: 7.090342679127727e-06 sample/s: 15.935654106628926 loss: 0.2647 (0.2354) time: 0.2522 data: 0.0023 max mem: 6793
2021-05-25 22:16:07,126 INFO torchdistill.misc.log Epoch: [1] Total time: 0:02:15
2021-05-25 22:16:10,577 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:16:10,578 INFO __main__ Validation: matthews_correlation = 0.6043989222564181
2021-05-25 22:16:10,578 INFO __main__ Updating ckpt
2021-05-25 22:16:17,171 INFO torchdistill.misc.log Epoch: [2] [ 0/535] eta: 0:02:25 lr: 6.654205607476636e-06 sample/s: 14.956847949733799 loss: 0.0537 (0.0537) time: 0.2712 data: 0.0037 max mem: 6793
2021-05-25 22:16:30,294 INFO torchdistill.misc.log Epoch: [2] [ 50/535] eta: 0:02:07 lr: 6.031152647975078e-06 sample/s: 14.163199286826208 loss: 0.0417 (0.1182) time: 0.2616 data: 0.0023 max mem: 6793
2021-05-25 22:16:42,753 INFO torchdistill.misc.log Epoch: [2] [100/535] eta: 0:01:51 lr: 5.408099688473521e-06 sample/s: 18.72304600183244 loss: 0.0036 (0.1272) time: 0.2478 data: 0.0023 max mem: 6793
2021-05-25 22:16:55,421 INFO torchdistill.misc.log Epoch: [2] [150/535] eta: 0:01:38 lr: 4.7850467289719636e-06 sample/s: 17.933821981022056 loss: 0.0642 (0.1666) time: 0.2506 data: 0.0023 max mem: 6793
2021-05-25 22:17:08,064 INFO torchdistill.misc.log Epoch: [2] [200/535] eta: 0:01:25 lr: 4.1619937694704055e-06 sample/s: 15.78100828969634 loss: 0.0050 (0.1866) time: 0.2558 data: 0.0022 max mem: 6793
2021-05-25 22:17:20,695 INFO torchdistill.misc.log Epoch: [2] [250/535] eta: 0:01:12 lr: 3.5389408099688475e-06 sample/s: 15.911321864152804 loss: 0.1082 (0.1935) time: 0.2541 data: 0.0022 max mem: 6793
2021-05-25 22:17:33,424 INFO torchdistill.misc.log Epoch: [2] [300/535] eta: 0:00:59 lr: 2.91588785046729e-06 sample/s: 15.906162717276503 loss: 0.0000 (0.2150) time: 0.2512 data: 0.0022 max mem: 6793
2021-05-25 22:17:45,850 INFO torchdistill.misc.log Epoch: [2] [350/535] eta: 0:00:46 lr: 2.2928348909657324e-06 sample/s: 15.864574616061812 loss: 0.0202 (0.2241) time: 0.2504 data: 0.0023 max mem: 6793
2021-05-25 22:17:58,459 INFO torchdistill.misc.log Epoch: [2] [400/535] eta: 0:00:34 lr: 1.6697819314641748e-06 sample/s: 15.983202467804412 loss: 0.0000 (0.2294) time: 0.2523 data: 0.0024 max mem: 6793
2021-05-25 22:18:11,040 INFO torchdistill.misc.log Epoch: [2] [450/535] eta: 0:00:21 lr: 1.046728971962617e-06 sample/s: 15.717480588781966 loss: 0.0068 (0.2289) time: 0.2524 data: 0.0023 max mem: 6793
2021-05-25 22:18:23,489 INFO torchdistill.misc.log Epoch: [2] [500/535] eta: 0:00:08 lr: 4.2367601246105923e-07 sample/s: 14.20932160827428 loss: 0.0000 (0.2368) time: 0.2523 data: 0.0023 max mem: 6793
2021-05-25 22:18:31,858 INFO torchdistill.misc.log Epoch: [2] Total time: 0:02:14
2021-05-25 22:18:35,313 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:18:35,314 INFO __main__ Validation: matthews_correlation = 0.610638611987945
2021-05-25 22:18:35,314 INFO __main__ Updating ckpt
2021-05-25 22:18:50,900 INFO __main__ [Student: bert-large-uncased]
2021-05-25 22:18:54,369 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:18:54,369 INFO __main__ Test: matthews_correlation = 0.610638611987945
2021-05-25 22:18:54,369 INFO __main__ Start prediction for private dataset(s)
2021-05-25 22:18:54,371 INFO __main__ cola/test: 1063 samples