File size: 5,027 Bytes
808baa0
 
0a8085f
 
 
 
 
 
808baa0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
2021-05-25 20:17:27,706	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/mse/bert_large_uncased.yaml', log='log/glue/stsb/mse/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1)
2021-05-25 20:17:27,744	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-25 20:17:37,444	INFO	__main__	Start training
2021-05-25 20:17:37,444	INFO	torchdistill.models.util	[student model]
2021-05-25 20:17:37,445	INFO	torchdistill.models.util	Using the original student model
2021-05-25 20:17:37,445	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-25 20:17:44,656	INFO	torchdistill.misc.log	Epoch: [0]  [  0/180]  eta: 0:02:54  lr: 2.9944444444444443e-05  sample/s: 4.157108273720882  loss: 12.1362 (12.1362)  time: 0.9706  data: 0.0084  max mem: 6148
2021-05-25 20:18:24,842	INFO	torchdistill.misc.log	Epoch: [0]  [ 50/180]  eta: 0:01:44  lr: 2.716666666666667e-05  sample/s: 5.079776319530379  loss: 7.2100 (10.7919)  time: 0.7976  data: 0.0035  max mem: 10858
2021-05-25 20:19:03,758	INFO	torchdistill.misc.log	Epoch: [0]  [100/180]  eta: 0:01:03  lr: 2.438888888888889e-05  sample/s: 5.071751190230392  loss: 0.7149 (6.2958)  time: 0.7898  data: 0.0039  max mem: 10858
2021-05-25 20:19:42,944	INFO	torchdistill.misc.log	Epoch: [0]  [150/180]  eta: 0:00:23  lr: 2.161111111111111e-05  sample/s: 4.658769325438214  loss: 0.5977 (4.4207)  time: 0.7743  data: 0.0036  max mem: 10861
2021-05-25 20:20:07,011	INFO	torchdistill.misc.log	Epoch: [0] Total time: 0:02:23
2021-05-25 20:20:15,900	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow
2021-05-25 20:20:15,900	INFO	__main__	Validation: pearson = 0.8737826052062094, spearmanr = 0.8735916113223366
2021-05-25 20:20:15,900	INFO	__main__	Updating ckpt
2021-05-25 20:20:21,438	INFO	torchdistill.misc.log	Epoch: [1]  [  0/180]  eta: 0:01:58  lr: 1.9944444444444447e-05  sample/s: 6.128770842374588  loss: 0.4325 (0.4325)  time: 0.6593  data: 0.0067  max mem: 10861
2021-05-25 20:21:00,988	INFO	torchdistill.misc.log	Epoch: [1]  [ 50/180]  eta: 0:01:42  lr: 1.7166666666666666e-05  sample/s: 4.297354562406508  loss: 0.3274 (0.3845)  time: 0.8074  data: 0.0038  max mem: 10861
2021-05-25 20:21:41,487	INFO	torchdistill.misc.log	Epoch: [1]  [100/180]  eta: 0:01:03  lr: 1.438888888888889e-05  sample/s: 7.083348708467611  loss: 0.3129 (0.3857)  time: 0.8153  data: 0.0037  max mem: 12382
2021-05-25 20:22:21,324	INFO	torchdistill.misc.log	Epoch: [1]  [150/180]  eta: 0:00:23  lr: 1.161111111111111e-05  sample/s: 6.354031294441572  loss: 0.2832 (0.3657)  time: 0.8022  data: 0.0038  max mem: 12382
2021-05-25 20:22:43,475	INFO	torchdistill.misc.log	Epoch: [1] Total time: 0:02:22
2021-05-25 20:22:52,330	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow
2021-05-25 20:22:52,330	INFO	__main__	Validation: pearson = 0.8993742147980403, spearmanr = 0.8971063152009764
2021-05-25 20:22:52,331	INFO	__main__	Updating ckpt
2021-05-25 20:22:58,258	INFO	torchdistill.misc.log	Epoch: [2]  [  0/180]  eta: 0:02:25  lr: 9.944444444444445e-06  sample/s: 4.98979598816883  loss: 0.1504 (0.1504)  time: 0.8066  data: 0.0050  max mem: 12382
2021-05-25 20:23:37,330	INFO	torchdistill.misc.log	Epoch: [2]  [ 50/180]  eta: 0:01:41  lr: 7.166666666666667e-06  sample/s: 5.076305784711369  loss: 0.1316 (0.1620)  time: 0.7998  data: 0.0037  max mem: 12382
2021-05-25 20:24:18,428	INFO	torchdistill.misc.log	Epoch: [2]  [100/180]  eta: 0:01:04  lr: 4.388888888888889e-06  sample/s: 4.601912661611403  loss: 0.1340 (0.1553)  time: 0.8041  data: 0.0037  max mem: 12382
2021-05-25 20:24:57,662	INFO	torchdistill.misc.log	Epoch: [2]  [150/180]  eta: 0:00:23  lr: 1.6111111111111111e-06  sample/s: 5.072415148418462  loss: 0.1245 (0.1507)  time: 0.7877  data: 0.0038  max mem: 12382
2021-05-25 20:25:20,536	INFO	torchdistill.misc.log	Epoch: [2] Total time: 0:02:23
2021-05-25 20:25:29,396	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow
2021-05-25 20:25:29,397	INFO	__main__	Validation: pearson = 0.9034122016001204, spearmanr = 0.9010440275420903
2021-05-25 20:25:29,397	INFO	__main__	Updating ckpt
2021-05-25 20:25:40,164	INFO	__main__	[Student: bert-large-uncased]
2021-05-25 20:25:49,037	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow
2021-05-25 20:25:49,038	INFO	__main__	Test: pearson = 0.9034122016001204, spearmanr = 0.9010440275420903
2021-05-25 20:25:49,038	INFO	__main__	Start prediction for private dataset(s)
2021-05-25 20:25:49,039	INFO	__main__	stsb/test: 1379 samples