File size: 6,820 Bytes
527a6a3
 
9ea3db7
 
 
 
 
 
527a6a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
2021-05-27 03:06:57,518	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/ce/bert_large_uncased.yaml', log='log/glue/rte/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)
2021-05-27 03:06:57,550	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-27 03:07:10,458	INFO	__main__	Start training
2021-05-27 03:07:10,459	INFO	torchdistill.models.util	[student model]
2021-05-27 03:07:10,459	INFO	torchdistill.models.util	Using the original student model
2021-05-27 03:07:10,459	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-27 03:07:13,632	INFO	torchdistill.misc.log	Epoch: [0]  [  0/312]  eta: 0:02:13  lr: 1.997863247863248e-05  sample/s: 9.410467095760941  loss: 0.6985 (0.6985)  time: 0.4283  data: 0.0032  max mem: 5355
2021-05-27 03:07:34,141	INFO	torchdistill.misc.log	Epoch: [0]  [ 50/312]  eta: 0:01:47  lr: 1.891025641025641e-05  sample/s: 9.260763560189773  loss: 0.6855 (0.7316)  time: 0.4141  data: 0.0017  max mem: 7365
2021-05-27 03:07:54,769	INFO	torchdistill.misc.log	Epoch: [0]  [100/312]  eta: 0:01:27  lr: 1.7841880341880344e-05  sample/s: 9.300060421620962  loss: 0.6784 (0.7141)  time: 0.4156  data: 0.0018  max mem: 7365
2021-05-27 03:08:15,006	INFO	torchdistill.misc.log	Epoch: [0]  [150/312]  eta: 0:01:06  lr: 1.6773504273504274e-05  sample/s: 9.304485750349947  loss: 0.6591 (0.7030)  time: 0.4016  data: 0.0017  max mem: 7365
2021-05-27 03:08:35,380	INFO	torchdistill.misc.log	Epoch: [0]  [200/312]  eta: 0:00:45  lr: 1.5705128205128205e-05  sample/s: 9.304217428726076  loss: 0.6689 (0.6939)  time: 0.4085  data: 0.0017  max mem: 7365
2021-05-27 03:08:55,663	INFO	torchdistill.misc.log	Epoch: [0]  [250/312]  eta: 0:00:25  lr: 1.4636752136752137e-05  sample/s: 9.296092555242803  loss: 0.6506 (0.6908)  time: 0.4118  data: 0.0018  max mem: 7365
2021-05-27 03:09:16,350	INFO	torchdistill.misc.log	Epoch: [0]  [300/312]  eta: 0:00:04  lr: 1.356837606837607e-05  sample/s: 9.244190153358904  loss: 0.6405 (0.6823)  time: 0.4173  data: 0.0017  max mem: 7365
2021-05-27 03:09:20,776	INFO	torchdistill.misc.log	Epoch: [0] Total time: 0:02:07
2021-05-27 03:09:24,609	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:09:24,610	INFO	__main__	Validation: accuracy = 0.6173285198555957
2021-05-27 03:09:24,610	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/rte/ce/rte-bert-large-uncased
2021-05-27 03:09:29,657	INFO	torchdistill.misc.log	Epoch: [1]  [  0/312]  eta: 0:02:19  lr: 1.3311965811965812e-05  sample/s: 9.012096369155467  loss: 0.6355 (0.6355)  time: 0.4460  data: 0.0021  max mem: 7365
2021-05-27 03:09:50,469	INFO	torchdistill.misc.log	Epoch: [1]  [ 50/312]  eta: 0:01:49  lr: 1.2243589743589746e-05  sample/s: 9.16842369299592  loss: 0.5310 (0.5850)  time: 0.4242  data: 0.0017  max mem: 7365
2021-05-27 03:10:10,381	INFO	torchdistill.misc.log	Epoch: [1]  [100/312]  eta: 0:01:26  lr: 1.1175213675213676e-05  sample/s: 11.36395200609068  loss: 0.4777 (0.5638)  time: 0.4081  data: 0.0017  max mem: 7365
2021-05-27 03:10:30,677	INFO	torchdistill.misc.log	Epoch: [1]  [150/312]  eta: 0:01:05  lr: 1.0106837606837608e-05  sample/s: 9.29443941850809  loss: 0.5052 (0.5646)  time: 0.4139  data: 0.0017  max mem: 7365
2021-05-27 03:10:51,658	INFO	torchdistill.misc.log	Epoch: [1]  [200/312]  eta: 0:00:45  lr: 9.03846153846154e-06  sample/s: 9.300792527973533  loss: 0.4903 (0.5516)  time: 0.4198  data: 0.0017  max mem: 7365
2021-05-27 03:11:12,170	INFO	torchdistill.misc.log	Epoch: [1]  [250/312]  eta: 0:00:25  lr: 7.970085470085472e-06  sample/s: 9.282976357127444  loss: 0.4948 (0.5433)  time: 0.4087  data: 0.0017  max mem: 7365
2021-05-27 03:11:32,902	INFO	torchdistill.misc.log	Epoch: [1]  [300/312]  eta: 0:00:04  lr: 6.901709401709402e-06  sample/s: 9.302071412730095  loss: 0.4903 (0.5442)  time: 0.4143  data: 0.0017  max mem: 7365
2021-05-27 03:11:36,971	INFO	torchdistill.misc.log	Epoch: [1] Total time: 0:02:07
2021-05-27 03:11:40,801	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:11:40,801	INFO	__main__	Validation: accuracy = 0.740072202166065
2021-05-27 03:11:40,802	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/rte/ce/rte-bert-large-uncased
2021-05-27 03:11:46,120	INFO	torchdistill.misc.log	Epoch: [2]  [  0/312]  eta: 0:02:19  lr: 6.645299145299145e-06  sample/s: 9.060380879964056  loss: 0.7969 (0.7969)  time: 0.4475  data: 0.0060  max mem: 7365
2021-05-27 03:12:06,781	INFO	torchdistill.misc.log	Epoch: [2]  [ 50/312]  eta: 0:01:48  lr: 5.576923076923077e-06  sample/s: 10.741735558355614  loss: 0.3631 (0.3967)  time: 0.4164  data: 0.0017  max mem: 7365
2021-05-27 03:12:27,317	INFO	torchdistill.misc.log	Epoch: [2]  [100/312]  eta: 0:01:27  lr: 4.508547008547009e-06  sample/s: 9.16765717108441  loss: 0.3599 (0.3882)  time: 0.4054  data: 0.0017  max mem: 7365
2021-05-27 03:12:47,698	INFO	torchdistill.misc.log	Epoch: [2]  [150/312]  eta: 0:01:06  lr: 3.4401709401709403e-06  sample/s: 13.347942659561435  loss: 0.2746 (0.3688)  time: 0.3992  data: 0.0017  max mem: 7365
2021-05-27 03:13:08,144	INFO	torchdistill.misc.log	Epoch: [2]  [200/312]  eta: 0:00:45  lr: 2.371794871794872e-06  sample/s: 9.294048107726917  loss: 0.2737 (0.3552)  time: 0.4062  data: 0.0017  max mem: 7365
2021-05-27 03:13:29,127	INFO	torchdistill.misc.log	Epoch: [2]  [250/312]  eta: 0:00:25  lr: 1.3034188034188036e-06  sample/s: 9.276037203192194  loss: 0.2493 (0.3468)  time: 0.4134  data: 0.0017  max mem: 7365
2021-05-27 03:13:49,840	INFO	torchdistill.misc.log	Epoch: [2]  [300/312]  eta: 0:00:04  lr: 2.3504273504273505e-07  sample/s: 9.257023235231964  loss: 0.1451 (0.3412)  time: 0.4061  data: 0.0018  max mem: 7365
2021-05-27 03:13:54,183	INFO	torchdistill.misc.log	Epoch: [2] Total time: 0:02:08
2021-05-27 03:13:58,014	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:13:58,014	INFO	__main__	Validation: accuracy = 0.7220216606498195
2021-05-27 03:14:04,515	INFO	__main__	[Student: bert-large-uncased]
2021-05-27 03:14:08,368	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow
2021-05-27 03:14:08,368	INFO	__main__	Test: accuracy = 0.740072202166065
2021-05-27 03:14:08,368	INFO	__main__	Start prediction for private dataset(s)
2021-05-27 03:14:08,369	INFO	__main__	rte/test: 3000 samples