File size: 13,860 Bytes
c0e1315
 
209e213
 
 
 
 
 
c0e1315
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
2021-05-26 16:42:43,973	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_large_uncased.yaml', log='log/glue/qnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2021-05-26 16:42:44,037	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-26 16:42:44,389	INFO	filelock	Lock 139623502170640 acquired on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
2021-05-26 16:42:44,742	INFO	filelock	Lock 139623502170640 released on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
2021-05-26 16:42:45,448	INFO	filelock	Lock 139623502137488 acquired on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-26 16:42:45,957	INFO	filelock	Lock 139623502137488 released on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-26 16:42:46,307	INFO	filelock	Lock 139623464315024 acquired on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-05-26 16:42:46,874	INFO	filelock	Lock 139623464315024 released on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-05-26 16:42:47,920	INFO	filelock	Lock 139623502137488 acquired on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-05-26 16:42:48,273	INFO	filelock	Lock 139623502137488 released on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-05-26 16:42:48,641	INFO	filelock	Lock 139623464420688 acquired on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
2021-05-26 16:43:11,143	INFO	filelock	Lock 139623464420688 released on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
2021-05-26 16:43:38,005	INFO	__main__	Start training
2021-05-26 16:43:38,006	INFO	torchdistill.models.util	[student model]
2021-05-26 16:43:38,006	INFO	torchdistill.models.util	Using the original student model
2021-05-26 16:43:38,006	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-26 16:43:44,804	INFO	torchdistill.misc.log	Epoch: [0]  [   0/6547]  eta: 1:18:28  lr: 1.9998981721908255e-05  sample/s: 5.754612514483244  loss: 0.7042 (0.7042)  time: 0.7192  data: 0.0241  max mem: 5376
2021-05-26 16:48:21,522	INFO	torchdistill.misc.log	Epoch: [0]  [ 500/6547]  eta: 0:55:48  lr: 1.9489842676034826e-05  sample/s: 8.624026680319357  loss: 0.3131 (0.4845)  time: 0.5568  data: 0.0025  max mem: 9056
2021-05-26 16:52:58,289	INFO	torchdistill.misc.log	Epoch: [0]  [1000/6547]  eta: 0:51:11  lr: 1.89807036301614e-05  sample/s: 5.549441556800583  loss: 0.3426 (0.4185)  time: 0.5480  data: 0.0025  max mem: 9056
2021-05-26 16:57:38,265	INFO	torchdistill.misc.log	Epoch: [0]  [1500/6547]  eta: 0:46:44  lr: 1.847156458428797e-05  sample/s: 8.61720964609647  loss: 0.2848 (0.3878)  time: 0.5259  data: 0.0025  max mem: 9056
2021-05-26 17:02:15,556	INFO	torchdistill.misc.log	Epoch: [0]  [2000/6547]  eta: 0:42:05  lr: 1.7962425538414542e-05  sample/s: 8.623813900317252  loss: 0.2981 (0.3675)  time: 0.5580  data: 0.0026  max mem: 9056
2021-05-26 17:06:49,604	INFO	torchdistill.misc.log	Epoch: [0]  [2500/6547]  eta: 0:37:21  lr: 1.7453286492541113e-05  sample/s: 7.9935431184565235  loss: 0.2886 (0.3520)  time: 0.5320  data: 0.0025  max mem: 9056
2021-05-26 17:11:23,224	INFO	torchdistill.misc.log	Epoch: [0]  [3000/6547]  eta: 0:32:40  lr: 1.6944147446667688e-05  sample/s: 7.997067573087834  loss: 0.2193 (0.3400)  time: 0.5656  data: 0.0025  max mem: 9056
2021-05-26 17:15:59,135	INFO	torchdistill.misc.log	Epoch: [0]  [3500/6547]  eta: 0:28:04  lr: 1.643500840079426e-05  sample/s: 7.416189779873241  loss: 0.2485 (0.3283)  time: 0.5845  data: 0.0026  max mem: 9056
2021-05-26 17:20:33,027	INFO	torchdistill.misc.log	Epoch: [0]  [4000/6547]  eta: 0:23:26  lr: 1.592586935492083e-05  sample/s: 6.24177459511022  loss: 0.2533 (0.3213)  time: 0.5377  data: 0.0025  max mem: 9056
2021-05-26 17:25:08,083	INFO	torchdistill.misc.log	Epoch: [0]  [4500/6547]  eta: 0:18:49  lr: 1.5416730309047404e-05  sample/s: 8.622683682589413  loss: 0.3024 (0.3135)  time: 0.5730  data: 0.0025  max mem: 9056
2021-05-26 17:29:46,131	INFO	torchdistill.misc.log	Epoch: [0]  [5000/6547]  eta: 0:14:14  lr: 1.4907591263173975e-05  sample/s: 6.9801876226312105  loss: 0.1747 (0.3079)  time: 0.5553  data: 0.0025  max mem: 9056
2021-05-26 17:34:22,397	INFO	torchdistill.misc.log	Epoch: [0]  [5500/6547]  eta: 0:09:38  lr: 1.4398452217300548e-05  sample/s: 6.5910168939530545  loss: 0.1467 (0.3024)  time: 0.5469  data: 0.0026  max mem: 9056
2021-05-26 17:38:57,431	INFO	torchdistill.misc.log	Epoch: [0]  [6000/6547]  eta: 0:05:02  lr: 1.3889313171427119e-05  sample/s: 6.97960684730679  loss: 0.2053 (0.2974)  time: 0.5522  data: 0.0026  max mem: 9056
2021-05-26 17:43:36,576	INFO	torchdistill.misc.log	Epoch: [0]  [6500/6547]  eta: 0:00:25  lr: 1.3380174125553688e-05  sample/s: 6.5825322864402445  loss: 0.2341 (0.2939)  time: 0.5723  data: 0.0026  max mem: 9056
2021-05-26 17:44:01,639	INFO	torchdistill.misc.log	Epoch: [0] Total time: 1:00:17
2021-05-26 17:45:01,865	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 17:45:01,866	INFO	__main__	Validation: accuracy = 0.9198242723778144
2021-05-26 17:45:01,866	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
2021-05-26 17:45:06,995	INFO	torchdistill.misc.log	Epoch: [1]  [   0/6547]  eta: 0:54:19  lr: 1.3332315055241587e-05  sample/s: 8.160307088582721  loss: 0.0503 (0.0503)  time: 0.4978  data: 0.0076  max mem: 9056
2021-05-26 17:49:46,069	INFO	torchdistill.misc.log	Epoch: [1]  [ 500/6547]  eta: 0:56:14  lr: 1.282317600936816e-05  sample/s: 6.217419742879262  loss: 0.0503 (0.2035)  time: 0.5897  data: 0.0025  max mem: 9056
2021-05-26 17:54:23,673	INFO	torchdistill.misc.log	Epoch: [1]  [1000/6547]  eta: 0:51:27  lr: 1.231403696349473e-05  sample/s: 6.589991684548619  loss: 0.3169 (0.2504)  time: 0.5670  data: 0.0026  max mem: 9056
2021-05-26 17:58:59,164	INFO	torchdistill.misc.log	Epoch: [1]  [1500/6547]  eta: 0:46:39  lr: 1.1804897917621303e-05  sample/s: 7.990056015872275  loss: 0.1724 (0.2513)  time: 0.5467  data: 0.0025  max mem: 9056
2021-05-26 18:03:36,412	INFO	torchdistill.misc.log	Epoch: [1]  [2000/6547]  eta: 0:42:02  lr: 1.1295758871747876e-05  sample/s: 5.847794095606037  loss: 0.2595 (0.2506)  time: 0.5593  data: 0.0025  max mem: 9056
2021-05-26 18:08:09,765	INFO	torchdistill.misc.log	Epoch: [1]  [2500/6547]  eta: 0:37:18  lr: 1.0786619825874447e-05  sample/s: 7.990375667126102  loss: 0.0139 (0.2572)  time: 0.5289  data: 0.0025  max mem: 9056
2021-05-26 18:12:47,367	INFO	torchdistill.misc.log	Epoch: [1]  [3000/6547]  eta: 0:32:43  lr: 1.027748078000102e-05  sample/s: 6.973642075580554  loss: 0.1740 (0.2673)  time: 0.5332  data: 0.0026  max mem: 9056
2021-05-26 18:17:24,171	INFO	torchdistill.misc.log	Epoch: [1]  [3500/6547]  eta: 0:28:06  lr: 9.76834173412759e-06  sample/s: 8.607342080623853  loss: 0.4704 (0.2760)  time: 0.5707  data: 0.0025  max mem: 9056
2021-05-26 18:22:01,335	INFO	torchdistill.misc.log	Epoch: [1]  [4000/6547]  eta: 0:23:29  lr: 9.259202688254163e-06  sample/s: 8.61629355725034  loss: 0.1867 (0.2859)  time: 0.5565  data: 0.0025  max mem: 9056
2021-05-26 18:26:39,703	INFO	torchdistill.misc.log	Epoch: [1]  [4500/6547]  eta: 0:18:53  lr: 8.750063642380736e-06  sample/s: 8.62179301385266  loss: 0.1528 (0.2933)  time: 0.5657  data: 0.0025  max mem: 9056
2021-05-26 18:31:13,854	INFO	torchdistill.misc.log	Epoch: [1]  [5000/6547]  eta: 0:14:16  lr: 8.240924596507307e-06  sample/s: 6.5886769838330705  loss: 0.1091 (0.2886)  time: 0.5920  data: 0.0026  max mem: 9056
2021-05-26 18:35:47,685	INFO	torchdistill.misc.log	Epoch: [1]  [5500/6547]  eta: 0:09:38  lr: 7.73178555063388e-06  sample/s: 6.590584506649424  loss: 0.0833 (0.2890)  time: 0.5445  data: 0.0025  max mem: 9056
2021-05-26 18:40:23,656	INFO	torchdistill.misc.log	Epoch: [1]  [6000/6547]  eta: 0:05:02  lr: 7.222646504760451e-06  sample/s: 8.620140668279657  loss: 0.5154 (0.2930)  time: 0.5567  data: 0.0026  max mem: 9056
2021-05-26 18:44:58,148	INFO	torchdistill.misc.log	Epoch: [1]  [6500/6547]  eta: 0:00:25  lr: 6.713507458887023e-06  sample/s: 5.843950385754279  loss: 0.0087 (0.3004)  time: 0.5434  data: 0.0025  max mem: 9056
2021-05-26 18:45:23,286	INFO	torchdistill.misc.log	Epoch: [1] Total time: 1:00:16
2021-05-26 18:46:23,512	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 18:46:23,512	INFO	__main__	Validation: accuracy = 0.9207395204100312
2021-05-26 18:46:23,513	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
2021-05-26 18:46:28,930	INFO	torchdistill.misc.log	Epoch: [2]  [   0/6547]  eta: 0:53:33  lr: 6.66564838857492e-06  sample/s: 8.299093972413548  loss: 0.0001 (0.0001)  time: 0.4908  data: 0.0088  max mem: 9056
2021-05-26 18:51:02,998	INFO	torchdistill.misc.log	Epoch: [2]  [ 500/6547]  eta: 0:55:13  lr: 6.156509342701492e-06  sample/s: 6.973630480902882  loss: 0.0000 (0.2118)  time: 0.5519  data: 0.0025  max mem: 9056
2021-05-26 18:55:38,010	INFO	torchdistill.misc.log	Epoch: [2]  [1000/6547]  eta: 0:50:45  lr: 5.647370296828064e-06  sample/s: 7.982749071696792  loss: 0.0000 (0.2042)  time: 0.5058  data: 0.0025  max mem: 9056
2021-05-26 19:00:14,646	INFO	torchdistill.misc.log	Epoch: [2]  [1500/6547]  eta: 0:46:18  lr: 5.1382312509546365e-06  sample/s: 7.044053211160169  loss: 0.0000 (0.1978)  time: 0.5426  data: 0.0025  max mem: 9056
2021-05-26 19:04:48,464	INFO	torchdistill.misc.log	Epoch: [2]  [2000/6547]  eta: 0:41:39  lr: 4.629092205081208e-06  sample/s: 7.044916908254754  loss: 0.0000 (0.1990)  time: 0.5594  data: 0.0026  max mem: 9056
2021-05-26 19:09:23,080	INFO	torchdistill.misc.log	Epoch: [2]  [2500/6547]  eta: 0:37:04  lr: 4.11995315920778e-06  sample/s: 7.037732786049721  loss: 0.0000 (0.1949)  time: 0.5999  data: 0.0026  max mem: 9056
2021-05-26 19:13:56,849	INFO	torchdistill.misc.log	Epoch: [2]  [3000/6547]  eta: 0:32:28  lr: 3.6108141133343523e-06  sample/s: 6.294816645236476  loss: 0.0000 (0.1968)  time: 0.5467  data: 0.0025  max mem: 9056
2021-05-26 19:18:30,816	INFO	torchdistill.misc.log	Epoch: [2]  [3500/6547]  eta: 0:27:53  lr: 3.1016750674609237e-06  sample/s: 7.4227914858222395  loss: 0.0000 (0.1972)  time: 0.5436  data: 0.0025  max mem: 9056
2021-05-26 19:23:04,965	INFO	torchdistill.misc.log	Epoch: [2]  [4000/6547]  eta: 0:23:18  lr: 2.5925360215874956e-06  sample/s: 5.55695934201043  loss: 0.0000 (0.1956)  time: 0.5532  data: 0.0025  max mem: 9056
2021-05-26 19:27:38,760	INFO	torchdistill.misc.log	Epoch: [2]  [4500/6547]  eta: 0:18:43  lr: 2.083396975714068e-06  sample/s: 6.64891320975452  loss: 0.0000 (0.1942)  time: 0.5342  data: 0.0026  max mem: 9056
2021-05-26 19:32:17,803	INFO	torchdistill.misc.log	Epoch: [2]  [5000/6547]  eta: 0:14:10  lr: 1.5742579298406396e-06  sample/s: 7.037428722675103  loss: 0.0000 (0.1911)  time: 0.5242  data: 0.0026  max mem: 9056
2021-05-26 19:36:50,820	INFO	torchdistill.misc.log	Epoch: [2]  [5500/6547]  eta: 0:09:35  lr: 1.0651188839672114e-06  sample/s: 7.036864946528987  loss: 0.0000 (0.1905)  time: 0.5605  data: 0.0026  max mem: 9056
2021-05-26 19:41:22,137	INFO	torchdistill.misc.log	Epoch: [2]  [6000/6547]  eta: 0:05:00  lr: 5.559798380937835e-07  sample/s: 8.73211181456121  loss: 0.0000 (0.1899)  time: 0.5563  data: 0.0025  max mem: 9056
2021-05-26 19:45:58,549	INFO	torchdistill.misc.log	Epoch: [2]  [6500/6547]  eta: 0:00:25  lr: 4.6840792220355385e-08  sample/s: 7.427628811940013  loss: 0.0000 (0.1888)  time: 0.5204  data: 0.0025  max mem: 9056
2021-05-26 19:46:24,589	INFO	torchdistill.misc.log	Epoch: [2] Total time: 0:59:56
2021-05-26 19:47:24,703	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 19:47:24,703	INFO	__main__	Validation: accuracy = 0.9222039172615779
2021-05-26 19:47:24,703	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
2021-05-26 19:47:35,803	INFO	__main__	[Student: bert-large-uncased]
2021-05-26 19:48:35,910	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
2021-05-26 19:48:35,910	INFO	__main__	Test: accuracy = 0.9222039172615779
2021-05-26 19:48:35,910	INFO	__main__	Start prediction for private dataset(s)
2021-05-26 19:48:35,912	INFO	__main__	qnli/test: 5463 samples