bert-large-uncased-qnli / training.log
yoshitomo-matsubara's picture
tuned hyperparameters
c0e1315
2021-05-26 16:42:43,973 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_large_uncased.yaml', log='log/glue/qnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2021-05-26 16:42:44,037 INFO __main__ Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True
2021-05-26 16:42:44,389 INFO filelock Lock 139623502170640 acquired on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
2021-05-26 16:42:44,742 INFO filelock Lock 139623502170640 released on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
2021-05-26 16:42:45,448 INFO filelock Lock 139623502137488 acquired on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-26 16:42:45,957 INFO filelock Lock 139623502137488 released on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-26 16:42:46,307 INFO filelock Lock 139623464315024 acquired on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-05-26 16:42:46,874 INFO filelock Lock 139623464315024 released on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-05-26 16:42:47,920 INFO filelock Lock 139623502137488 acquired on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-05-26 16:42:48,273 INFO filelock Lock 139623502137488 released on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-05-26 16:42:48,641 INFO filelock Lock 139623464420688 acquired on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
2021-05-26 16:43:11,143 INFO filelock Lock 139623464420688 released on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
2021-05-26 16:43:38,005 INFO __main__ Start training
2021-05-26 16:43:38,006 INFO torchdistill.models.util [student model]
2021-05-26 16:43:38,006 INFO torchdistill.models.util Using the original student model
2021-05-26 16:43:38,006 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
2021-05-26 16:43:44,804 INFO torchdistill.misc.log Epoch: [0] [ 0/6547] eta: 1:18:28 lr: 1.9998981721908255e-05 sample/s: 5.754612514483244 loss: 0.7042 (0.7042) time: 0.7192 data: 0.0241 max mem: 5376
2021-05-26 16:48:21,522 INFO torchdistill.misc.log Epoch: [0] [ 500/6547] eta: 0:55:48 lr: 1.9489842676034826e-05 sample/s: 8.624026680319357 loss: 0.3131 (0.4845) time: 0.5568 data: 0.0025 max mem: 9056
2021-05-26 16:52:58,289 INFO torchdistill.misc.log Epoch: [0] [1000/6547] eta: 0:51:11 lr: 1.89807036301614e-05 sample/s: 5.549441556800583 loss: 0.3426 (0.4185) time: 0.5480 data: 0.0025 max mem: 9056
2021-05-26 16:57:38,265 INFO torchdistill.misc.log Epoch: [0] [1500/6547] eta: 0:46:44 lr: 1.847156458428797e-05 sample/s: 8.61720964609647 loss: 0.2848 (0.3878) time: 0.5259 data: 0.0025 max mem: 9056
2021-05-26 17:02:15,556 INFO torchdistill.misc.log Epoch: [0] [2000/6547] eta: 0:42:05 lr: 1.7962425538414542e-05 sample/s: 8.623813900317252 loss: 0.2981 (0.3675) time: 0.5580 data: 0.0026 max mem: 9056
2021-05-26 17:06:49,604 INFO torchdistill.misc.log Epoch: [0] [2500/6547] eta: 0:37:21 lr: 1.7453286492541113e-05 sample/s: 7.9935431184565235 loss: 0.2886 (0.3520) time: 0.5320 data: 0.0025 max mem: 9056
2021-05-26 17:11:23,224 INFO torchdistill.misc.log Epoch: [0] [3000/6547] eta: 0:32:40 lr: 1.6944147446667688e-05 sample/s: 7.997067573087834 loss: 0.2193 (0.3400) time: 0.5656 data: 0.0025 max mem: 9056
2021-05-26 17:15:59,135 INFO torchdistill.misc.log Epoch: [0] [3500/6547] eta: 0:28:04 lr: 1.643500840079426e-05 sample/s: 7.416189779873241 loss: 0.2485 (0.3283) time: 0.5845 data: 0.0026 max mem: 9056
2021-05-26 17:20:33,027 INFO torchdistill.misc.log Epoch: [0] [4000/6547] eta: 0:23:26 lr: 1.592586935492083e-05 sample/s: 6.24177459511022 loss: 0.2533 (0.3213) time: 0.5377 data: 0.0025 max mem: 9056
2021-05-26 17:25:08,083 INFO torchdistill.misc.log Epoch: [0] [4500/6547] eta: 0:18:49 lr: 1.5416730309047404e-05 sample/s: 8.622683682589413 loss: 0.3024 (0.3135) time: 0.5730 data: 0.0025 max mem: 9056
2021-05-26 17:29:46,131 INFO torchdistill.misc.log Epoch: [0] [5000/6547] eta: 0:14:14 lr: 1.4907591263173975e-05 sample/s: 6.9801876226312105 loss: 0.1747 (0.3079) time: 0.5553 data: 0.0025 max mem: 9056
2021-05-26 17:34:22,397 INFO torchdistill.misc.log Epoch: [0] [5500/6547] eta: 0:09:38 lr: 1.4398452217300548e-05 sample/s: 6.5910168939530545 loss: 0.1467 (0.3024) time: 0.5469 data: 0.0026 max mem: 9056
2021-05-26 17:38:57,431 INFO torchdistill.misc.log Epoch: [0] [6000/6547] eta: 0:05:02 lr: 1.3889313171427119e-05 sample/s: 6.97960684730679 loss: 0.2053 (0.2974) time: 0.5522 data: 0.0026 max mem: 9056
2021-05-26 17:43:36,576 INFO torchdistill.misc.log Epoch: [0] [6500/6547] eta: 0:00:25 lr: 1.3380174125553688e-05 sample/s: 6.5825322864402445 loss: 0.2341 (0.2939) time: 0.5723 data: 0.0026 max mem: 9056
2021-05-26 17:44:01,639 INFO torchdistill.misc.log Epoch: [0] Total time: 1:00:17
2021-05-26 17:45:01,865 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 17:45:01,866 INFO __main__ Validation: accuracy = 0.9198242723778144
2021-05-26 17:45:01,866 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
2021-05-26 17:45:06,995 INFO torchdistill.misc.log Epoch: [1] [ 0/6547] eta: 0:54:19 lr: 1.3332315055241587e-05 sample/s: 8.160307088582721 loss: 0.0503 (0.0503) time: 0.4978 data: 0.0076 max mem: 9056
2021-05-26 17:49:46,069 INFO torchdistill.misc.log Epoch: [1] [ 500/6547] eta: 0:56:14 lr: 1.282317600936816e-05 sample/s: 6.217419742879262 loss: 0.0503 (0.2035) time: 0.5897 data: 0.0025 max mem: 9056
2021-05-26 17:54:23,673 INFO torchdistill.misc.log Epoch: [1] [1000/6547] eta: 0:51:27 lr: 1.231403696349473e-05 sample/s: 6.589991684548619 loss: 0.3169 (0.2504) time: 0.5670 data: 0.0026 max mem: 9056
2021-05-26 17:58:59,164 INFO torchdistill.misc.log Epoch: [1] [1500/6547] eta: 0:46:39 lr: 1.1804897917621303e-05 sample/s: 7.990056015872275 loss: 0.1724 (0.2513) time: 0.5467 data: 0.0025 max mem: 9056
2021-05-26 18:03:36,412 INFO torchdistill.misc.log Epoch: [1] [2000/6547] eta: 0:42:02 lr: 1.1295758871747876e-05 sample/s: 5.847794095606037 loss: 0.2595 (0.2506) time: 0.5593 data: 0.0025 max mem: 9056
2021-05-26 18:08:09,765 INFO torchdistill.misc.log Epoch: [1] [2500/6547] eta: 0:37:18 lr: 1.0786619825874447e-05 sample/s: 7.990375667126102 loss: 0.0139 (0.2572) time: 0.5289 data: 0.0025 max mem: 9056
2021-05-26 18:12:47,367 INFO torchdistill.misc.log Epoch: [1] [3000/6547] eta: 0:32:43 lr: 1.027748078000102e-05 sample/s: 6.973642075580554 loss: 0.1740 (0.2673) time: 0.5332 data: 0.0026 max mem: 9056
2021-05-26 18:17:24,171 INFO torchdistill.misc.log Epoch: [1] [3500/6547] eta: 0:28:06 lr: 9.76834173412759e-06 sample/s: 8.607342080623853 loss: 0.4704 (0.2760) time: 0.5707 data: 0.0025 max mem: 9056
2021-05-26 18:22:01,335 INFO torchdistill.misc.log Epoch: [1] [4000/6547] eta: 0:23:29 lr: 9.259202688254163e-06 sample/s: 8.61629355725034 loss: 0.1867 (0.2859) time: 0.5565 data: 0.0025 max mem: 9056
2021-05-26 18:26:39,703 INFO torchdistill.misc.log Epoch: [1] [4500/6547] eta: 0:18:53 lr: 8.750063642380736e-06 sample/s: 8.62179301385266 loss: 0.1528 (0.2933) time: 0.5657 data: 0.0025 max mem: 9056
2021-05-26 18:31:13,854 INFO torchdistill.misc.log Epoch: [1] [5000/6547] eta: 0:14:16 lr: 8.240924596507307e-06 sample/s: 6.5886769838330705 loss: 0.1091 (0.2886) time: 0.5920 data: 0.0026 max mem: 9056
2021-05-26 18:35:47,685 INFO torchdistill.misc.log Epoch: [1] [5500/6547] eta: 0:09:38 lr: 7.73178555063388e-06 sample/s: 6.590584506649424 loss: 0.0833 (0.2890) time: 0.5445 data: 0.0025 max mem: 9056
2021-05-26 18:40:23,656 INFO torchdistill.misc.log Epoch: [1] [6000/6547] eta: 0:05:02 lr: 7.222646504760451e-06 sample/s: 8.620140668279657 loss: 0.5154 (0.2930) time: 0.5567 data: 0.0026 max mem: 9056
2021-05-26 18:44:58,148 INFO torchdistill.misc.log Epoch: [1] [6500/6547] eta: 0:00:25 lr: 6.713507458887023e-06 sample/s: 5.843950385754279 loss: 0.0087 (0.3004) time: 0.5434 data: 0.0025 max mem: 9056
2021-05-26 18:45:23,286 INFO torchdistill.misc.log Epoch: [1] Total time: 1:00:16
2021-05-26 18:46:23,512 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 18:46:23,512 INFO __main__ Validation: accuracy = 0.9207395204100312
2021-05-26 18:46:23,513 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
2021-05-26 18:46:28,930 INFO torchdistill.misc.log Epoch: [2] [ 0/6547] eta: 0:53:33 lr: 6.66564838857492e-06 sample/s: 8.299093972413548 loss: 0.0001 (0.0001) time: 0.4908 data: 0.0088 max mem: 9056
2021-05-26 18:51:02,998 INFO torchdistill.misc.log Epoch: [2] [ 500/6547] eta: 0:55:13 lr: 6.156509342701492e-06 sample/s: 6.973630480902882 loss: 0.0000 (0.2118) time: 0.5519 data: 0.0025 max mem: 9056
2021-05-26 18:55:38,010 INFO torchdistill.misc.log Epoch: [2] [1000/6547] eta: 0:50:45 lr: 5.647370296828064e-06 sample/s: 7.982749071696792 loss: 0.0000 (0.2042) time: 0.5058 data: 0.0025 max mem: 9056
2021-05-26 19:00:14,646 INFO torchdistill.misc.log Epoch: [2] [1500/6547] eta: 0:46:18 lr: 5.1382312509546365e-06 sample/s: 7.044053211160169 loss: 0.0000 (0.1978) time: 0.5426 data: 0.0025 max mem: 9056
2021-05-26 19:04:48,464 INFO torchdistill.misc.log Epoch: [2] [2000/6547] eta: 0:41:39 lr: 4.629092205081208e-06 sample/s: 7.044916908254754 loss: 0.0000 (0.1990) time: 0.5594 data: 0.0026 max mem: 9056
2021-05-26 19:09:23,080 INFO torchdistill.misc.log Epoch: [2] [2500/6547] eta: 0:37:04 lr: 4.11995315920778e-06 sample/s: 7.037732786049721 loss: 0.0000 (0.1949) time: 0.5999 data: 0.0026 max mem: 9056
2021-05-26 19:13:56,849 INFO torchdistill.misc.log Epoch: [2] [3000/6547] eta: 0:32:28 lr: 3.6108141133343523e-06 sample/s: 6.294816645236476 loss: 0.0000 (0.1968) time: 0.5467 data: 0.0025 max mem: 9056
2021-05-26 19:18:30,816 INFO torchdistill.misc.log Epoch: [2] [3500/6547] eta: 0:27:53 lr: 3.1016750674609237e-06 sample/s: 7.4227914858222395 loss: 0.0000 (0.1972) time: 0.5436 data: 0.0025 max mem: 9056
2021-05-26 19:23:04,965 INFO torchdistill.misc.log Epoch: [2] [4000/6547] eta: 0:23:18 lr: 2.5925360215874956e-06 sample/s: 5.55695934201043 loss: 0.0000 (0.1956) time: 0.5532 data: 0.0025 max mem: 9056
2021-05-26 19:27:38,760 INFO torchdistill.misc.log Epoch: [2] [4500/6547] eta: 0:18:43 lr: 2.083396975714068e-06 sample/s: 6.64891320975452 loss: 0.0000 (0.1942) time: 0.5342 data: 0.0026 max mem: 9056
2021-05-26 19:32:17,803 INFO torchdistill.misc.log Epoch: [2] [5000/6547] eta: 0:14:10 lr: 1.5742579298406396e-06 sample/s: 7.037428722675103 loss: 0.0000 (0.1911) time: 0.5242 data: 0.0026 max mem: 9056
2021-05-26 19:36:50,820 INFO torchdistill.misc.log Epoch: [2] [5500/6547] eta: 0:09:35 lr: 1.0651188839672114e-06 sample/s: 7.036864946528987 loss: 0.0000 (0.1905) time: 0.5605 data: 0.0026 max mem: 9056
2021-05-26 19:41:22,137 INFO torchdistill.misc.log Epoch: [2] [6000/6547] eta: 0:05:00 lr: 5.559798380937835e-07 sample/s: 8.73211181456121 loss: 0.0000 (0.1899) time: 0.5563 data: 0.0025 max mem: 9056
2021-05-26 19:45:58,549 INFO torchdistill.misc.log Epoch: [2] [6500/6547] eta: 0:00:25 lr: 4.6840792220355385e-08 sample/s: 7.427628811940013 loss: 0.0000 (0.1888) time: 0.5204 data: 0.0025 max mem: 9056
2021-05-26 19:46:24,589 INFO torchdistill.misc.log Epoch: [2] Total time: 0:59:56
2021-05-26 19:47:24,703 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 19:47:24,703 INFO __main__ Validation: accuracy = 0.9222039172615779
2021-05-26 19:47:24,703 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
2021-05-26 19:47:35,803 INFO __main__ [Student: bert-large-uncased]
2021-05-26 19:48:35,910 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 19:48:35,910 INFO __main__ Test: accuracy = 0.9222039172615779
2021-05-26 19:48:35,910 INFO __main__ Start prediction for private dataset(s)
2021-05-26 19:48:35,912 INFO __main__ qnli/test: 5463 samples