File size: 8,611 Bytes
1d49bf3
 
c94d09f
 
 
 
 
 
1d49bf3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
2021-05-25 19:49:54,507	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_large_uncased.yaml', log='log/glue/mrpc/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2021-05-25 19:49:54,546	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-25 19:50:00,656	WARNING	datasets.builder	Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-25 19:50:02,769	INFO	__main__	Start training
2021-05-25 19:50:02,770	INFO	torchdistill.models.util	[student model]
2021-05-25 19:50:02,770	INFO	torchdistill.models.util	Using the original student model
2021-05-25 19:50:02,770	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-25 19:50:08,870	INFO	torchdistill.misc.log	Epoch: [0]  [  0/230]  eta: 0:02:28  lr: 2.997391304347826e-05  sample/s: 6.235015084699529  loss: 0.7412 (0.7412)  time: 0.6463  data: 0.0048  max mem: 5376
2021-05-25 19:50:34,421	INFO	torchdistill.misc.log	Epoch: [0]  [ 50/230]  eta: 0:01:32  lr: 2.8669565217391306e-05  sample/s: 8.613480125640086  loss: 0.6172 (0.6453)  time: 0.5105  data: 0.0027  max mem: 8071
2021-05-25 19:50:59,496	INFO	torchdistill.misc.log	Epoch: [0]  [100/230]  eta: 0:01:05  lr: 2.736521739130435e-05  sample/s: 8.610049872675337  loss: 0.5745 (0.6214)  time: 0.4995  data: 0.0027  max mem: 8340
2021-05-25 19:51:24,306	INFO	torchdistill.misc.log	Epoch: [0]  [150/230]  eta: 0:00:40  lr: 2.6060869565217393e-05  sample/s: 7.983387230932337  loss: 0.5508 (0.5978)  time: 0.4954  data: 0.0027  max mem: 8343
2021-05-25 19:51:49,480	INFO	torchdistill.misc.log	Epoch: [0]  [200/230]  eta: 0:00:15  lr: 2.4756521739130433e-05  sample/s: 7.338986194467289  loss: 0.5243 (0.5819)  time: 0.5120  data: 0.0026  max mem: 8343
2021-05-25 19:52:03,767	INFO	torchdistill.misc.log	Epoch: [0] Total time: 0:01:55
2021-05-25 19:52:07,585	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:52:07,585	INFO	__main__	Validation: accuracy = 0.8063725490196079, f1 = 0.8596802841918295
2021-05-25 19:52:07,585	INFO	__main__	Updating ckpt
2021-05-25 19:52:15,203	INFO	torchdistill.misc.log	Epoch: [1]  [  0/230]  eta: 0:01:52  lr: 2.3973913043478262e-05  sample/s: 8.210228471122164  loss: 0.2772 (0.2772)  time: 0.4906  data: 0.0034  max mem: 8343
2021-05-25 19:52:40,453	INFO	torchdistill.misc.log	Epoch: [1]  [ 50/230]  eta: 0:01:30  lr: 2.2669565217391306e-05  sample/s: 7.988404891733914  loss: 0.4216 (0.4209)  time: 0.5130  data: 0.0027  max mem: 8343
2021-05-25 19:53:05,607	INFO	torchdistill.misc.log	Epoch: [1]  [100/230]  eta: 0:01:05  lr: 2.1365217391304346e-05  sample/s: 8.616271431909633  loss: 0.4311 (0.4199)  time: 0.4990  data: 0.0027  max mem: 8343
2021-05-25 19:53:30,398	INFO	torchdistill.misc.log	Epoch: [1]  [150/230]  eta: 0:00:40  lr: 2.0060869565217393e-05  sample/s: 8.594665095668656  loss: 0.2910 (0.3994)  time: 0.4916  data: 0.0027  max mem: 8343
2021-05-25 19:53:55,443	INFO	torchdistill.misc.log	Epoch: [1]  [200/230]  eta: 0:00:15  lr: 1.8756521739130436e-05  sample/s: 7.977715750281739  loss: 0.3855 (0.3950)  time: 0.4897  data: 0.0027  max mem: 8343
2021-05-25 19:54:09,895	INFO	torchdistill.misc.log	Epoch: [1] Total time: 0:01:55
2021-05-25 19:54:13,711	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:54:13,712	INFO	__main__	Validation: accuracy = 0.8553921568627451, f1 = 0.899488926746167
2021-05-25 19:54:13,712	INFO	__main__	Updating ckpt
2021-05-25 19:54:21,849	INFO	torchdistill.misc.log	Epoch: [2]  [  0/230]  eta: 0:02:26  lr: 1.7973913043478262e-05  sample/s: 6.3179051200469365  loss: 0.0593 (0.0593)  time: 0.6368  data: 0.0037  max mem: 8343
2021-05-25 19:54:47,318	INFO	torchdistill.misc.log	Epoch: [2]  [ 50/230]  eta: 0:01:32  lr: 1.6669565217391305e-05  sample/s: 7.4142856069608625  loss: 0.1097 (0.1770)  time: 0.5018  data: 0.0029  max mem: 8343
2021-05-25 19:55:12,346	INFO	torchdistill.misc.log	Epoch: [2]  [100/230]  eta: 0:01:05  lr: 1.536521739130435e-05  sample/s: 6.963207990009169  loss: 0.1442 (0.1919)  time: 0.4986  data: 0.0026  max mem: 8343
2021-05-25 19:55:37,133	INFO	torchdistill.misc.log	Epoch: [2]  [150/230]  eta: 0:00:40  lr: 1.4060869565217393e-05  sample/s: 7.411881388451892  loss: 0.2391 (0.1978)  time: 0.5008  data: 0.0027  max mem: 8343
2021-05-25 19:56:02,055	INFO	torchdistill.misc.log	Epoch: [2]  [200/230]  eta: 0:00:15  lr: 1.2756521739130435e-05  sample/s: 7.392133605686278  loss: 0.0212 (0.1974)  time: 0.5063  data: 0.0027  max mem: 8343
2021-05-25 19:56:16,583	INFO	torchdistill.misc.log	Epoch: [2] Total time: 0:01:55
2021-05-25 19:56:20,398	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:56:20,399	INFO	__main__	Validation: accuracy = 0.8455882352941176, f1 = 0.8877005347593583
2021-05-25 19:56:20,866	INFO	torchdistill.misc.log	Epoch: [3]  [  0/230]  eta: 0:01:47  lr: 1.197391304347826e-05  sample/s: 8.617222924178455  loss: 0.2669 (0.2669)  time: 0.4671  data: 0.0029  max mem: 8343
2021-05-25 19:56:45,813	INFO	torchdistill.misc.log	Epoch: [3]  [ 50/230]  eta: 0:01:29  lr: 1.0669565217391305e-05  sample/s: 8.612507353673449  loss: 0.0047 (0.1619)  time: 0.5022  data: 0.0026  max mem: 8343
2021-05-25 19:57:10,910	INFO	torchdistill.misc.log	Epoch: [3]  [100/230]  eta: 0:01:05  lr: 9.365217391304347e-06  sample/s: 7.41079443382604  loss: 0.0007 (0.1258)  time: 0.5112  data: 0.0026  max mem: 8343
2021-05-25 19:57:35,937	INFO	torchdistill.misc.log	Epoch: [3]  [150/230]  eta: 0:00:40  lr: 8.060869565217392e-06  sample/s: 7.976221401119329  loss: 0.0001 (0.1121)  time: 0.5023  data: 0.0027  max mem: 8343
2021-05-25 19:58:00,921	INFO	torchdistill.misc.log	Epoch: [3]  [200/230]  eta: 0:00:15  lr: 6.756521739130434e-06  sample/s: 8.603982899951587  loss: 0.0000 (0.1496)  time: 0.4897  data: 0.0027  max mem: 8343
2021-05-25 19:58:15,689	INFO	torchdistill.misc.log	Epoch: [3] Total time: 0:01:55
2021-05-25 19:58:19,502	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:58:19,503	INFO	__main__	Validation: accuracy = 0.8504901960784313, f1 = 0.8884826325411335
2021-05-25 19:58:20,004	INFO	torchdistill.misc.log	Epoch: [4]  [  0/230]  eta: 0:01:55  lr: 5.973913043478261e-06  sample/s: 8.038922893074167  loss: 0.0000 (0.0000)  time: 0.5005  data: 0.0029  max mem: 8343
2021-05-25 19:58:45,232	INFO	torchdistill.misc.log	Epoch: [4]  [ 50/230]  eta: 0:01:30  lr: 4.669565217391304e-06  sample/s: 8.065907185329344  loss: 0.0000 (0.1483)  time: 0.5099  data: 0.0028  max mem: 8343
2021-05-25 19:59:10,190	INFO	torchdistill.misc.log	Epoch: [4]  [100/230]  eta: 0:01:05  lr: 3.365217391304348e-06  sample/s: 8.05736566867782  loss: 0.0000 (0.1296)  time: 0.5081  data: 0.0028  max mem: 8343
2021-05-25 19:59:35,163	INFO	torchdistill.misc.log	Epoch: [4]  [150/230]  eta: 0:00:40  lr: 2.0608695652173915e-06  sample/s: 7.483834821575934  loss: 0.0000 (0.1036)  time: 0.4983  data: 0.0026  max mem: 8343
2021-05-25 19:59:59,755	INFO	torchdistill.misc.log	Epoch: [4]  [200/230]  eta: 0:00:14  lr: 7.565217391304349e-07  sample/s: 8.709057781053877  loss: 0.0000 (0.1062)  time: 0.4910  data: 0.0027  max mem: 8343
2021-05-25 20:00:13,848	INFO	torchdistill.misc.log	Epoch: [4] Total time: 0:01:54
2021-05-25 20:00:17,666	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 20:00:17,666	INFO	__main__	Validation: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
2021-05-25 20:00:17,666	INFO	__main__	Updating ckpt
2021-05-25 20:00:31,000	INFO	__main__	[Student: bert-large-uncased]
2021-05-25 20:00:34,825	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 20:00:34,825	INFO	__main__	Test: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
2021-05-25 20:00:34,826	INFO	__main__	Start prediction for private dataset(s)
2021-05-25 20:00:34,827	INFO	__main__	mrpc/test: 1725 samples