File size: 18,881 Bytes
22de1d3
 
50aa989
 
 
 
 
 
22de1d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
2021-05-26 19:11:02,756	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_large_uncased.yaml', log='log/glue/mnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021-05-26 19:11:02,808	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-26 19:11:33,729	WARNING	datasets.builder	Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-26 19:12:20,927	WARNING	datasets.builder	Reusing dataset glue (/root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-26 19:12:22,683	INFO	__main__	Start training
2021-05-26 19:12:22,684	INFO	torchdistill.models.util	[student model]
2021-05-26 19:12:22,684	INFO	torchdistill.models.util	Using the original student model
2021-05-26 19:12:22,684	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-26 19:12:26,973	INFO	torchdistill.misc.log	Epoch: [0]  [    0/24544]  eta: 2:31:32  lr: 1.999972837896567e-05  sample/s: 12.123115565180507  loss: 1.1665 (1.1665)  time: 0.3704  data: 0.0405  max mem: 5355
2021-05-26 19:15:03,514	INFO	torchdistill.misc.log	Epoch: [0]  [ 1000/24544]  eta: 1:01:30  lr: 1.972810734463277e-05  sample/s: 26.643611936032016  loss: 0.4966 (0.7978)  time: 0.1565  data: 0.0028  max mem: 9034
2021-05-26 19:17:40,135	INFO	torchdistill.misc.log	Epoch: [0]  [ 2000/24544]  eta: 0:58:52  lr: 1.9456486310299872e-05  sample/s: 27.679465456795217  loss: 0.5308 (0.6753)  time: 0.1576  data: 0.0028  max mem: 9034
2021-05-26 19:20:17,766	INFO	torchdistill.misc.log	Epoch: [0]  [ 3000/24544]  eta: 0:56:22  lr: 1.9184865275966974e-05  sample/s: 21.87914915370501  loss: 0.4498 (0.6280)  time: 0.1605  data: 0.0028  max mem: 9034
2021-05-26 19:22:54,312	INFO	torchdistill.misc.log	Epoch: [0]  [ 4000/24544]  eta: 0:53:43  lr: 1.8913244241634076e-05  sample/s: 24.011839028146316  loss: 0.3829 (0.5939)  time: 0.1587  data: 0.0027  max mem: 9034
2021-05-26 19:25:31,397	INFO	torchdistill.misc.log	Epoch: [0]  [ 5000/24544]  eta: 0:51:06  lr: 1.8641623207301177e-05  sample/s: 24.881084864807146  loss: 0.3964 (0.5686)  time: 0.1602  data: 0.0027  max mem: 9034
2021-05-26 19:28:08,506	INFO	torchdistill.misc.log	Epoch: [0]  [ 6000/24544]  eta: 0:48:30  lr: 1.8370002172968276e-05  sample/s: 27.22999367016701  loss: 0.3826 (0.5527)  time: 0.1524  data: 0.0028  max mem: 9034
2021-05-26 19:30:45,858	INFO	torchdistill.misc.log	Epoch: [0]  [ 7000/24544]  eta: 0:45:54  lr: 1.8098381138635377e-05  sample/s: 24.056011918161573  loss: 0.4032 (0.5388)  time: 0.1597  data: 0.0028  max mem: 9034
2021-05-26 19:33:22,940	INFO	torchdistill.misc.log	Epoch: [0]  [ 8000/24544]  eta: 0:43:17  lr: 1.782676010430248e-05  sample/s: 26.72731385749652  loss: 0.4228 (0.5270)  time: 0.1555  data: 0.0027  max mem: 9034
2021-05-26 19:36:00,181	INFO	torchdistill.misc.log	Epoch: [0]  [ 9000/24544]  eta: 0:40:41  lr: 1.755513906996958e-05  sample/s: 27.097306455494415  loss: 0.4902 (0.5164)  time: 0.1585  data: 0.0026  max mem: 9034
2021-05-26 19:38:37,048	INFO	torchdistill.misc.log	Epoch: [0]  [10000/24544]  eta: 0:38:03  lr: 1.7283518035636683e-05  sample/s: 24.899585185405055  loss: 0.3073 (0.5080)  time: 0.1533  data: 0.0026  max mem: 9034
2021-05-26 19:41:13,402	INFO	torchdistill.misc.log	Epoch: [0]  [11000/24544]  eta: 0:35:25  lr: 1.7011897001303784e-05  sample/s: 26.376325913302132  loss: 0.3966 (0.5002)  time: 0.1545  data: 0.0026  max mem: 9034
2021-05-26 19:43:50,316	INFO	torchdistill.misc.log	Epoch: [0]  [12000/24544]  eta: 0:32:48  lr: 1.6740275966970883e-05  sample/s: 27.519061506615184  loss: 0.4471 (0.4942)  time: 0.1535  data: 0.0026  max mem: 9034
2021-05-26 19:46:27,366	INFO	torchdistill.misc.log	Epoch: [0]  [13000/24544]  eta: 0:30:12  lr: 1.6468654932637984e-05  sample/s: 27.152036174196756  loss: 0.3334 (0.4879)  time: 0.1564  data: 0.0027  max mem: 9034
2021-05-26 19:49:04,444	INFO	torchdistill.misc.log	Epoch: [0]  [14000/24544]  eta: 0:27:35  lr: 1.6197033898305086e-05  sample/s: 26.7613347795572  loss: 0.3891 (0.4822)  time: 0.1609  data: 0.0029  max mem: 9034
2021-05-26 19:51:41,450	INFO	torchdistill.misc.log	Epoch: [0]  [15000/24544]  eta: 0:24:58  lr: 1.5925412863972188e-05  sample/s: 28.193590019359004  loss: 0.4292 (0.4766)  time: 0.1557  data: 0.0026  max mem: 9034
2021-05-26 19:54:18,472	INFO	torchdistill.misc.log	Epoch: [0]  [16000/24544]  eta: 0:22:21  lr: 1.565379182963929e-05  sample/s: 25.112961216638976  loss: 0.3714 (0.4715)  time: 0.1548  data: 0.0027  max mem: 9034
2021-05-26 19:56:55,577	INFO	torchdistill.misc.log	Epoch: [0]  [17000/24544]  eta: 0:19:44  lr: 1.538217079530639e-05  sample/s: 26.46332060948295  loss: 0.3461 (0.4665)  time: 0.1560  data: 0.0027  max mem: 9034
2021-05-26 19:59:31,974	INFO	torchdistill.misc.log	Epoch: [0]  [18000/24544]  eta: 0:17:07  lr: 1.5110549760973491e-05  sample/s: 27.044453435683003  loss: 0.2556 (0.4616)  time: 0.1563  data: 0.0028  max mem: 9034
2021-05-26 20:02:08,374	INFO	torchdistill.misc.log	Epoch: [0]  [19000/24544]  eta: 0:14:29  lr: 1.4838928726640591e-05  sample/s: 26.93417370769164  loss: 0.3405 (0.4576)  time: 0.1580  data: 0.0026  max mem: 9034
2021-05-26 20:04:45,080	INFO	torchdistill.misc.log	Epoch: [0]  [20000/24544]  eta: 0:11:53  lr: 1.4567307692307693e-05  sample/s: 27.49177977363858  loss: 0.3791 (0.4541)  time: 0.1547  data: 0.0027  max mem: 9034
2021-05-26 20:07:22,339	INFO	torchdistill.misc.log	Epoch: [0]  [21000/24544]  eta: 0:09:16  lr: 1.4295686657974795e-05  sample/s: 28.231116256451074  loss: 0.3786 (0.4511)  time: 0.1622  data: 0.0028  max mem: 9034
2021-05-26 20:09:58,871	INFO	torchdistill.misc.log	Epoch: [0]  [22000/24544]  eta: 0:06:39  lr: 1.4024065623641896e-05  sample/s: 27.339283357260534  loss: 0.4165 (0.4477)  time: 0.1622  data: 0.0028  max mem: 9034
2021-05-26 20:12:36,039	INFO	torchdistill.misc.log	Epoch: [0]  [23000/24544]  eta: 0:04:02  lr: 1.3752444589308998e-05  sample/s: 22.758457160065436  loss: 0.3802 (0.4446)  time: 0.1555  data: 0.0027  max mem: 9034
2021-05-26 20:15:12,598	INFO	torchdistill.misc.log	Epoch: [0]  [24000/24544]  eta: 0:01:25  lr: 1.34808235549761e-05  sample/s: 28.02765809938272  loss: 0.3125 (0.4414)  time: 0.1585  data: 0.0027  max mem: 9034
2021-05-26 20:16:38,263	INFO	torchdistill.misc.log	Epoch: [0] Total time: 1:04:11
2021-05-26 20:16:58,591	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 20:16:58,592	INFO	__main__	Validation: accuracy = 0.8665308201732043
2021-05-26 20:16:58,592	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-large-uncased
2021-05-26 20:17:03,627	INFO	torchdistill.misc.log	Epoch: [1]  [    0/24544]  eta: 1:33:58  lr: 1.3333061712299002e-05  sample/s: 19.747310188972392  loss: 0.4682 (0.4682)  time: 0.2297  data: 0.0271  max mem: 9034
2021-05-26 20:19:40,585	INFO	torchdistill.misc.log	Epoch: [1]  [ 1000/24544]  eta: 1:01:37  lr: 1.3061440677966102e-05  sample/s: 26.252342839259867  loss: 0.1850 (0.2208)  time: 0.1565  data: 0.0027  max mem: 9034
2021-05-26 20:22:17,591	INFO	torchdistill.misc.log	Epoch: [1]  [ 2000/24544]  eta: 0:58:59  lr: 1.2789819643633204e-05  sample/s: 21.8097504982106  loss: 0.2099 (0.2356)  time: 0.1602  data: 0.0027  max mem: 9034
2021-05-26 20:24:54,769	INFO	torchdistill.misc.log	Epoch: [1]  [ 3000/24544]  eta: 0:56:23  lr: 1.2518198609300306e-05  sample/s: 27.91364301876747  loss: 0.2192 (0.2369)  time: 0.1556  data: 0.0027  max mem: 9034
2021-05-26 20:27:32,375	INFO	torchdistill.misc.log	Epoch: [1]  [ 4000/24544]  eta: 0:53:49  lr: 1.2246577574967407e-05  sample/s: 27.884924849457917  loss: 0.1388 (0.2382)  time: 0.1618  data: 0.0028  max mem: 9034
2021-05-26 20:30:08,985	INFO	torchdistill.misc.log	Epoch: [1]  [ 5000/24544]  eta: 0:51:10  lr: 1.1974956540634507e-05  sample/s: 25.346978857897394  loss: 0.2185 (0.2378)  time: 0.1555  data: 0.0027  max mem: 9034
2021-05-26 20:32:46,374	INFO	torchdistill.misc.log	Epoch: [1]  [ 6000/24544]  eta: 0:48:33  lr: 1.1703335506301609e-05  sample/s: 27.4796219041395  loss: 0.2571 (0.2396)  time: 0.1560  data: 0.0027  max mem: 9034
2021-05-26 20:35:22,973	INFO	torchdistill.misc.log	Epoch: [1]  [ 7000/24544]  eta: 0:45:55  lr: 1.143171447196871e-05  sample/s: 28.076955002476804  loss: 0.1621 (0.2406)  time: 0.1579  data: 0.0027  max mem: 9034
2021-05-26 20:38:00,044	INFO	torchdistill.misc.log	Epoch: [1]  [ 8000/24544]  eta: 0:43:18  lr: 1.116009343763581e-05  sample/s: 25.54037357852913  loss: 0.1671 (0.2414)  time: 0.1615  data: 0.0027  max mem: 9034
2021-05-26 20:40:36,918	INFO	torchdistill.misc.log	Epoch: [1]  [ 9000/24544]  eta: 0:40:40  lr: 1.0888472403302913e-05  sample/s: 24.698818735351963  loss: 0.2244 (0.2411)  time: 0.1594  data: 0.0027  max mem: 9034
2021-05-26 20:43:13,900	INFO	torchdistill.misc.log	Epoch: [1]  [10000/24544]  eta: 0:38:03  lr: 1.0616851368970014e-05  sample/s: 24.829091856245597  loss: 0.1934 (0.2437)  time: 0.1562  data: 0.0027  max mem: 9034
2021-05-26 20:45:50,810	INFO	torchdistill.misc.log	Epoch: [1]  [11000/24544]  eta: 0:35:26  lr: 1.0345230334637116e-05  sample/s: 26.688114023924662  loss: 0.2520 (0.2445)  time: 0.1587  data: 0.0028  max mem: 9034
2021-05-26 20:48:27,672	INFO	torchdistill.misc.log	Epoch: [1]  [12000/24544]  eta: 0:32:49  lr: 1.0073609300304216e-05  sample/s: 27.18719170312753  loss: 0.1216 (0.2450)  time: 0.1556  data: 0.0026  max mem: 9034
2021-05-26 20:51:04,476	INFO	torchdistill.misc.log	Epoch: [1]  [13000/24544]  eta: 0:30:12  lr: 9.801988265971318e-06  sample/s: 28.178247754435702  loss: 0.2546 (0.2451)  time: 0.1543  data: 0.0026  max mem: 9034
2021-05-26 20:53:41,206	INFO	torchdistill.misc.log	Epoch: [1]  [14000/24544]  eta: 0:27:35  lr: 9.53036723163842e-06  sample/s: 21.816443589673572  loss: 0.2472 (0.2448)  time: 0.1608  data: 0.0028  max mem: 9034
2021-05-26 20:56:18,004	INFO	torchdistill.misc.log	Epoch: [1]  [15000/24544]  eta: 0:24:58  lr: 9.258746197305521e-06  sample/s: 27.04789787804823  loss: 0.1441 (0.2442)  time: 0.1567  data: 0.0027  max mem: 9034
2021-05-26 20:58:54,796	INFO	torchdistill.misc.log	Epoch: [1]  [16000/24544]  eta: 0:22:20  lr: 8.987125162972621e-06  sample/s: 25.65468688595532  loss: 0.2517 (0.2445)  time: 0.1564  data: 0.0027  max mem: 9034
2021-05-26 21:01:32,093	INFO	torchdistill.misc.log	Epoch: [1]  [17000/24544]  eta: 0:19:44  lr: 8.715504128639723e-06  sample/s: 22.732769752540246  loss: 0.3479 (0.2443)  time: 0.1581  data: 0.0027  max mem: 9034
2021-05-26 21:04:08,873	INFO	torchdistill.misc.log	Epoch: [1]  [18000/24544]  eta: 0:17:07  lr: 8.443883094306825e-06  sample/s: 26.943862990686924  loss: 0.2213 (0.2436)  time: 0.1575  data: 0.0028  max mem: 9034
2021-05-26 21:06:45,412	INFO	torchdistill.misc.log	Epoch: [1]  [19000/24544]  eta: 0:14:30  lr: 8.172262059973926e-06  sample/s: 26.424266478925592  loss: 0.1826 (0.2434)  time: 0.1576  data: 0.0027  max mem: 9034
2021-05-26 21:09:22,506	INFO	torchdistill.misc.log	Epoch: [1]  [20000/24544]  eta: 0:11:53  lr: 7.900641025641026e-06  sample/s: 25.074415219317977  loss: 0.2069 (0.2432)  time: 0.1558  data: 0.0027  max mem: 9034
2021-05-26 21:11:58,927	INFO	torchdistill.misc.log	Epoch: [1]  [21000/24544]  eta: 0:09:16  lr: 7.629019991308127e-06  sample/s: 27.590701804876044  loss: 0.2668 (0.2434)  time: 0.1545  data: 0.0027  max mem: 9034
2021-05-26 21:14:35,504	INFO	torchdistill.misc.log	Epoch: [1]  [22000/24544]  eta: 0:06:39  lr: 7.357398956975229e-06  sample/s: 27.259238076978693  loss: 0.1533 (0.2432)  time: 0.1531  data: 0.0027  max mem: 9034
2021-05-26 21:17:13,344	INFO	torchdistill.misc.log	Epoch: [1]  [23000/24544]  eta: 0:04:02  lr: 7.08577792264233e-06  sample/s: 28.61345976737047  loss: 0.1621 (0.2431)  time: 0.1588  data: 0.0026  max mem: 9034
2021-05-26 21:19:50,219	INFO	torchdistill.misc.log	Epoch: [1]  [24000/24544]  eta: 0:01:25  lr: 6.8141568883094315e-06  sample/s: 24.86183830306838  loss: 0.2084 (0.2426)  time: 0.1577  data: 0.0027  max mem: 9034
2021-05-26 21:21:15,455	INFO	torchdistill.misc.log	Epoch: [1] Total time: 1:04:12
2021-05-26 21:21:35,766	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 21:21:35,767	INFO	__main__	Validation: accuracy = 0.866225165562914
2021-05-26 21:21:35,944	INFO	torchdistill.misc.log	Epoch: [2]  [    0/24544]  eta: 1:11:55  lr: 6.666395045632335e-06  sample/s: 26.641665713886454  loss: 0.1711 (0.1711)  time: 0.1758  data: 0.0257  max mem: 9034
2021-05-26 21:24:12,602	INFO	torchdistill.misc.log	Epoch: [2]  [ 1000/24544]  eta: 1:01:28  lr: 6.394774011299436e-06  sample/s: 25.478160772915167  loss: 0.0000 (0.2154)  time: 0.1594  data: 0.0027  max mem: 9034
2021-05-26 21:26:48,122	INFO	torchdistill.misc.log	Epoch: [2]  [ 2000/24544]  eta: 0:58:39  lr: 6.123152976966536e-06  sample/s: 29.191880596183893  loss: 0.0000 (0.2594)  time: 0.1560  data: 0.0026  max mem: 9034
2021-05-26 21:29:23,722	INFO	torchdistill.misc.log	Epoch: [2]  [ 3000/24544]  eta: 0:55:59  lr: 5.851531942633638e-06  sample/s: 23.31094851566039  loss: 0.0000 (0.2732)  time: 0.1529  data: 0.0027  max mem: 9034
2021-05-26 21:31:59,493	INFO	torchdistill.misc.log	Epoch: [2]  [ 4000/24544]  eta: 0:53:22  lr: 5.57991090830074e-06  sample/s: 28.10545818828756  loss: 0.0268 (0.2814)  time: 0.1574  data: 0.0027  max mem: 9034
2021-05-26 21:34:34,884	INFO	torchdistill.misc.log	Epoch: [2]  [ 5000/24544]  eta: 0:50:44  lr: 5.30828987396784e-06  sample/s: 26.16505434282533  loss: 0.0000 (0.2821)  time: 0.1500  data: 0.0027  max mem: 9034
2021-05-26 21:37:11,222	INFO	torchdistill.misc.log	Epoch: [2]  [ 6000/24544]  eta: 0:48:10  lr: 5.0366688396349415e-06  sample/s: 24.967990130203333  loss: 0.0000 (0.2841)  time: 0.1545  data: 0.0027  max mem: 9034
2021-05-26 21:39:46,971	INFO	torchdistill.misc.log	Epoch: [2]  [ 7000/24544]  eta: 0:45:34  lr: 4.765047805302043e-06  sample/s: 24.54743723489709  loss: 0.0181 (0.2855)  time: 0.1525  data: 0.0027  max mem: 9034
2021-05-26 21:42:22,158	INFO	torchdistill.misc.log	Epoch: [2]  [ 8000/24544]  eta: 0:42:57  lr: 4.493426770969144e-06  sample/s: 22.231136175716074  loss: 0.0000 (0.2868)  time: 0.1544  data: 0.0027  max mem: 9034
2021-05-26 21:44:57,381	INFO	torchdistill.misc.log	Epoch: [2]  [ 9000/24544]  eta: 0:40:20  lr: 4.221805736636245e-06  sample/s: 26.316582354924982  loss: 0.0000 (0.2887)  time: 0.1524  data: 0.0026  max mem: 9034
2021-05-26 21:47:33,414	INFO	torchdistill.misc.log	Epoch: [2]  [10000/24544]  eta: 0:37:45  lr: 3.950184702303347e-06  sample/s: 23.86255843954503  loss: 0.0003 (0.2882)  time: 0.1609  data: 0.0028  max mem: 9034
2021-05-26 21:50:08,383	INFO	torchdistill.misc.log	Epoch: [2]  [11000/24544]  eta: 0:35:08  lr: 3.678563667970448e-06  sample/s: 26.68255374373182  loss: 0.3676 (0.2855)  time: 0.1545  data: 0.0027  max mem: 9034
2021-05-26 21:52:43,285	INFO	torchdistill.misc.log	Epoch: [2]  [12000/24544]  eta: 0:32:31  lr: 3.4069426336375493e-06  sample/s: 25.459447811611312  loss: 0.0000 (0.2846)  time: 0.1537  data: 0.0027  max mem: 9034
2021-05-26 21:55:18,524	INFO	torchdistill.misc.log	Epoch: [2]  [13000/24544]  eta: 0:29:56  lr: 3.1353215993046506e-06  sample/s: 24.929554150538273  loss: 0.0000 (0.2846)  time: 0.1595  data: 0.0027  max mem: 9034
2021-05-26 21:57:55,270	INFO	torchdistill.misc.log	Epoch: [2]  [14000/24544]  eta: 0:27:21  lr: 2.8637005649717515e-06  sample/s: 27.516714503616473  loss: 0.1809 (0.2848)  time: 0.1594  data: 0.0028  max mem: 9034
2021-05-26 22:00:31,919	INFO	torchdistill.misc.log	Epoch: [2]  [15000/24544]  eta: 0:24:46  lr: 2.5920795306388528e-06  sample/s: 28.056718090221164  loss: 0.0959 (0.2857)  time: 0.1554  data: 0.0027  max mem: 9034
2021-05-26 22:03:07,727	INFO	torchdistill.misc.log	Epoch: [2]  [16000/24544]  eta: 0:22:10  lr: 2.320458496305954e-06  sample/s: 25.48872795003502  loss: 0.0000 (0.2860)  time: 0.1578  data: 0.0028  max mem: 9034
2021-05-26 22:05:43,590	INFO	torchdistill.misc.log	Epoch: [2]  [17000/24544]  eta: 0:19:34  lr: 2.0488374619730554e-06  sample/s: 24.52734053486662  loss: 0.0000 (0.2869)  time: 0.1545  data: 0.0027  max mem: 9034
2021-05-26 22:08:19,209	INFO	torchdistill.misc.log	Epoch: [2]  [18000/24544]  eta: 0:16:59  lr: 1.7772164276401565e-06  sample/s: 27.08741905120178  loss: 0.0000 (0.2869)  time: 0.1550  data: 0.0027  max mem: 9034
2021-05-26 22:10:53,843	INFO	torchdistill.misc.log	Epoch: [2]  [19000/24544]  eta: 0:14:23  lr: 1.5055953933072578e-06  sample/s: 28.067091144517644  loss: 0.0000 (0.2887)  time: 0.1572  data: 0.0028  max mem: 9034
2021-05-26 22:13:29,898	INFO	torchdistill.misc.log	Epoch: [2]  [20000/24544]  eta: 0:11:47  lr: 1.233974358974359e-06  sample/s: 25.48025028856084  loss: 0.0000 (0.2887)  time: 0.1577  data: 0.0027  max mem: 9034
2021-05-26 22:16:06,871	INFO	torchdistill.misc.log	Epoch: [2]  [21000/24544]  eta: 0:09:12  lr: 9.623533246414604e-07  sample/s: 27.475571670244978  loss: 0.0000 (0.2893)  time: 0.1557  data: 0.0029  max mem: 9034
2021-05-26 22:18:43,970	INFO	torchdistill.misc.log	Epoch: [2]  [22000/24544]  eta: 0:06:36  lr: 6.907322903085615e-07  sample/s: 25.952121373370375  loss: 0.0000 (0.2901)  time: 0.1565  data: 0.0028  max mem: 9034
2021-05-26 22:21:20,398	INFO	torchdistill.misc.log	Epoch: [2]  [23000/24544]  eta: 0:04:00  lr: 4.191112559756628e-07  sample/s: 28.05516983050394  loss: 0.0000 (0.2902)  time: 0.1573  data: 0.0028  max mem: 9034
2021-05-26 22:23:56,531	INFO	torchdistill.misc.log	Epoch: [2]  [24000/24544]  eta: 0:01:24  lr: 1.4749022164276403e-07  sample/s: 25.47750303335859  loss: 0.0000 (0.2901)  time: 0.1553  data: 0.0027  max mem: 9034
2021-05-26 22:25:21,372	INFO	torchdistill.misc.log	Epoch: [2] Total time: 1:03:45
2021-05-26 22:25:41,707	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 22:25:41,708	INFO	__main__	Validation: accuracy = 0.8601120733571065
2021-05-26 22:25:51,217	INFO	__main__	[Student: bert-large-uncased]
2021-05-26 22:26:11,586	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 22:26:11,587	INFO	__main__	Test: accuracy = 0.8665308201732043
2021-05-26 22:26:11,587	INFO	__main__	Start prediction for private dataset(s)
2021-05-26 22:26:11,589	INFO	__main__	mnli/test_m: 9796 samples
2021-05-26 22:26:31,714	INFO	__main__	mnli/test_mm: 9847 samples
2021-05-26 22:26:51,852	INFO	__main__	ax/test_ax: 1104 samples