yoshitomo-matsubara commited on
Commit
22de1d3
1 Parent(s): 50aa989

tuned hyperparameters

Browse files
Files changed (3) hide show
  1. pytorch_model.bin +1 -1
  2. tokenizer.json +0 -0
  3. training.log +100 -72
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:60e65b82b971489a2d598bc76b7fd2dbc2666b6e36945ec825c85bcb31a7cd4f
3
  size 1340750921
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90b34b2184a9e60bd96809521f7e5564bf576888c715b96418a8c51223f83078
3
  size 1340750921
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
training.log CHANGED
@@ -1,78 +1,106 @@
1
- 2021-05-22 16:51:27,390 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_large_uncased.yaml', log='log/glue/mnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2
- 2021-05-22 16:51:27,468 INFO __main__ Distributed environment: NO
3
  Num processes: 1
4
  Process index: 0
5
  Local process index: 0
6
  Device: cuda
7
  Use FP16 precision: True
8
 
9
- 2021-05-22 16:51:28,560 INFO filelock Lock 140388380183568 acquired on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
10
- 2021-05-22 16:51:29,127 INFO filelock Lock 140388380183568 released on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
11
- 2021-05-22 16:51:30,244 INFO filelock Lock 140388420260240 acquired on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
12
- 2021-05-22 16:51:31,503 INFO filelock Lock 140388420260240 released on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
13
- 2021-05-22 16:51:32,063 INFO filelock Lock 140388380324432 acquired on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
14
- 2021-05-22 16:51:33,505 INFO filelock Lock 140388380324432 released on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
15
- 2021-05-22 16:51:35,451 INFO filelock Lock 140388380350864 acquired on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
16
- 2021-05-22 16:51:36,009 INFO filelock Lock 140388380350864 released on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
17
- 2021-05-22 16:51:36,585 INFO filelock Lock 140388380327440 acquired on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
18
- 2021-05-22 16:51:59,770 INFO filelock Lock 140388380327440 released on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
19
- 2021-05-22 16:53:50,809 INFO __main__ Start training
20
- 2021-05-22 16:53:50,810 INFO torchdistill.models.util [student model]
21
- 2021-05-22 16:53:50,810 INFO torchdistill.models.util Using the original student model
22
- 2021-05-22 16:53:50,810 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
23
- 2021-05-22 16:53:58,061 INFO torchdistill.misc.log Epoch: [0] [ 0/12272] eta: 1:16:07 lr: 1.9999456757931336e-05 sample/s: 12.183084487517528 loss: 1.1712 (1.1712) time: 0.3722 data: 0.0438 max mem: 6528
24
- 2021-05-22 16:57:39,884 INFO torchdistill.misc.log Epoch: [0] [ 1000/12272] eta: 0:41:42 lr: 1.945621468926554e-05 sample/s: 20.478750076289288 loss: 0.5207 (0.7022) time: 0.2147 data: 0.0042 max mem: 12387
25
- 2021-05-22 17:01:20,141 INFO torchdistill.misc.log Epoch: [0] [ 2000/12272] eta: 0:37:51 lr: 1.891297262059974e-05 sample/s: 15.393974983851077 loss: 0.4313 (0.6009) time: 0.2211 data: 0.0043 max mem: 12387
26
- 2021-05-22 17:05:02,731 INFO torchdistill.misc.log Epoch: [0] [ 3000/12272] eta: 0:34:14 lr: 1.8369730551933943e-05 sample/s: 18.06303300441961 loss: 0.4815 (0.5557) time: 0.2245 data: 0.0044 max mem: 12387
27
- 2021-05-22 17:08:46,475 INFO torchdistill.misc.log Epoch: [0] [ 4000/12272] eta: 0:30:37 lr: 1.7826488483268146e-05 sample/s: 21.499356065605703 loss: 0.3788 (0.5245) time: 0.2279 data: 0.0044 max mem: 12387
28
- 2021-05-22 17:12:27,148 INFO torchdistill.misc.log Epoch: [0] [ 5000/12272] eta: 0:26:53 lr: 1.728324641460235e-05 sample/s: 18.00671446357275 loss: 0.4535 (0.5039) time: 0.2255 data: 0.0045 max mem: 12387
29
- 2021-05-22 17:16:11,174 INFO torchdistill.misc.log Epoch: [0] [ 6000/12272] eta: 0:23:13 lr: 1.674000434593655e-05 sample/s: 18.482076681398965 loss: 0.3041 (0.4878) time: 0.2298 data: 0.0048 max mem: 12387
30
- 2021-05-22 17:19:54,854 INFO torchdistill.misc.log Epoch: [0] [ 7000/12272] eta: 0:19:32 lr: 1.6196762277270753e-05 sample/s: 17.582475809604055 loss: 0.4186 (0.4757) time: 0.2196 data: 0.0043 max mem: 12387
31
- 2021-05-22 17:23:37,409 INFO torchdistill.misc.log Epoch: [0] [ 8000/12272] eta: 0:15:50 lr: 1.5653520208604957e-05 sample/s: 18.052984294095115 loss: 0.3516 (0.4650) time: 0.2145 data: 0.0044 max mem: 12387
32
- 2021-05-22 17:27:17,991 INFO torchdistill.misc.log Epoch: [0] [ 9000/12272] eta: 0:12:07 lr: 1.5110278139939158e-05 sample/s: 22.68604630722653 loss: 0.3512 (0.4559) time: 0.2295 data: 0.0043 max mem: 12387
33
- 2021-05-22 17:31:01,464 INFO torchdistill.misc.log Epoch: [0] [10000/12272] eta: 0:08:25 lr: 1.4567036071273362e-05 sample/s: 16.33036813667613 loss: 0.2831 (0.4491) time: 0.2283 data: 0.0043 max mem: 12387
34
- 2021-05-22 17:34:44,016 INFO torchdistill.misc.log Epoch: [0] [11000/12272] eta: 0:04:42 lr: 1.4023794002607562e-05 sample/s: 20.46775732289036 loss: 0.3536 (0.4437) time: 0.2124 data: 0.0044 max mem: 12387
35
- 2021-05-22 17:38:24,552 INFO torchdistill.misc.log Epoch: [0] [12000/12272] eta: 0:01:00 lr: 1.3480551933941765e-05 sample/s: 18.079364121485902 loss: 0.4479 (0.4387) time: 0.2239 data: 0.0043 max mem: 12387
36
- 2021-05-22 17:39:24,832 INFO torchdistill.misc.log Epoch: [0] Total time: 0:45:27
37
- 2021-05-22 17:39:43,192 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
38
- 2021-05-22 17:39:43,193 INFO __main__ Validation: accuracy = 0.8611309220580744
39
- 2021-05-22 17:39:43,193 INFO __main__ Updating ckpt
40
- 2021-05-22 17:39:48,350 INFO torchdistill.misc.log Epoch: [1] [ 0/12272] eta: 0:46:53 lr: 1.333279009126467e-05 sample/s: 20.607311527884058 loss: 0.2311 (0.2311) time: 0.2293 data: 0.0352 max mem: 12387
41
- 2021-05-22 17:43:29,458 INFO torchdistill.misc.log Epoch: [1] [ 1000/12272] eta: 0:41:32 lr: 1.2789548022598873e-05 sample/s: 20.56273287053379 loss: 0.1454 (0.2223) time: 0.2243 data: 0.0042 max mem: 12387
42
- 2021-05-22 17:47:11,803 INFO torchdistill.misc.log Epoch: [1] [ 2000/12272] eta: 0:37:57 lr: 1.2246305953933073e-05 sample/s: 18.59424568869974 loss: 0.2245 (0.2225) time: 0.2319 data: 0.0048 max mem: 12387
43
- 2021-05-22 17:50:54,138 INFO torchdistill.misc.log Epoch: [1] [ 3000/12272] eta: 0:34:17 lr: 1.1703063885267276e-05 sample/s: 18.018356427434547 loss: 0.2228 (0.2222) time: 0.2174 data: 0.0043 max mem: 12387
44
- 2021-05-22 17:54:36,556 INFO torchdistill.misc.log Epoch: [1] [ 4000/12272] eta: 0:30:36 lr: 1.115982181660148e-05 sample/s: 20.332667584495162 loss: 0.1867 (0.2216) time: 0.2261 data: 0.0043 max mem: 12387
45
- 2021-05-22 17:58:20,409 INFO torchdistill.misc.log Epoch: [1] [ 5000/12272] eta: 0:26:57 lr: 1.061657974793568e-05 sample/s: 17.845956815829588 loss: 0.1976 (0.2198) time: 0.2294 data: 0.0042 max mem: 12387
46
- 2021-05-22 18:02:02,847 INFO torchdistill.misc.log Epoch: [1] [ 6000/12272] eta: 0:23:14 lr: 1.0073337679269883e-05 sample/s: 21.467115188009178 loss: 0.1879 (0.2201) time: 0.2190 data: 0.0042 max mem: 12387
47
- 2021-05-22 18:05:42,715 INFO torchdistill.misc.log Epoch: [1] [ 7000/12272] eta: 0:19:30 lr: 9.530095610604087e-06 sample/s: 17.160899388528726 loss: 0.2007 (0.2208) time: 0.2206 data: 0.0043 max mem: 12387
48
- 2021-05-22 18:09:23,036 INFO torchdistill.misc.log Epoch: [1] [ 8000/12272] eta: 0:15:47 lr: 8.986853541938288e-06 sample/s: 18.76536935211822 loss: 0.1739 (0.2222) time: 0.2313 data: 0.0043 max mem: 12387
49
- 2021-05-22 18:13:05,941 INFO torchdistill.misc.log Epoch: [1] [ 9000/12272] eta: 0:12:06 lr: 8.44361147327249e-06 sample/s: 21.49795107699799 loss: 0.1878 (0.2220) time: 0.2245 data: 0.0043 max mem: 12387
50
- 2021-05-22 18:16:47,980 INFO torchdistill.misc.log Epoch: [1] [10000/12272] eta: 0:08:24 lr: 7.900369404606693e-06 sample/s: 14.830137577212604 loss: 0.1875 (0.2221) time: 0.2259 data: 0.0043 max mem: 12387
51
- 2021-05-22 18:20:30,312 INFO torchdistill.misc.log Epoch: [1] [11000/12272] eta: 0:04:42 lr: 7.357127335940896e-06 sample/s: 20.469380509379288 loss: 0.2438 (0.2223) time: 0.2209 data: 0.0042 max mem: 12387
52
- 2021-05-22 18:24:12,649 INFO torchdistill.misc.log Epoch: [1] [12000/12272] eta: 0:01:00 lr: 6.813885267275099e-06 sample/s: 22.500544501606683 loss: 0.1945 (0.2219) time: 0.2258 data: 0.0043 max mem: 12387
53
- 2021-05-22 18:25:12,909 INFO torchdistill.misc.log Epoch: [1] Total time: 0:45:24
54
- 2021-05-22 18:25:31,246 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
55
- 2021-05-22 18:25:31,247 INFO __main__ Validation: accuracy = 0.8580743759551707
56
- 2021-05-22 18:25:31,544 INFO torchdistill.misc.log Epoch: [2] [ 0/12272] eta: 1:00:43 lr: 6.666123424598001e-06 sample/s: 14.723753056256177 loss: 0.2830 (0.2830) time: 0.2969 data: 0.0253 max mem: 12387
57
- 2021-05-22 18:29:12,317 INFO torchdistill.misc.log Epoch: [2] [ 1000/12272] eta: 0:41:29 lr: 6.122881355932204e-06 sample/s: 21.425308023947203 loss: 0.0852 (0.1137) time: 0.2224 data: 0.0042 max mem: 12387
58
- 2021-05-22 18:32:53,024 INFO torchdistill.misc.log Epoch: [2] [ 2000/12272] eta: 0:37:47 lr: 5.579639287266406e-06 sample/s: 16.40405652223226 loss: 0.0455 (0.1167) time: 0.2192 data: 0.0041 max mem: 12387
59
- 2021-05-22 18:36:32,375 INFO torchdistill.misc.log Epoch: [2] [ 3000/12272] eta: 0:34:02 lr: 5.0363972186006095e-06 sample/s: 18.737662697375622 loss: 0.1362 (0.1172) time: 0.2145 data: 0.0043 max mem: 12387
60
- 2021-05-22 18:40:14,688 INFO torchdistill.misc.log Epoch: [2] [ 4000/12272] eta: 0:30:26 lr: 4.493155149934811e-06 sample/s: 18.113464380838863 loss: 0.0532 (0.1177) time: 0.2220 data: 0.0043 max mem: 12387
61
- 2021-05-22 18:43:57,248 INFO torchdistill.misc.log Epoch: [2] [ 5000/12272] eta: 0:26:48 lr: 3.949913081269014e-06 sample/s: 14.875937326267126 loss: 0.1144 (0.1180) time: 0.2276 data: 0.0043 max mem: 12387
62
- 2021-05-22 18:47:39,622 INFO torchdistill.misc.log Epoch: [2] [ 6000/12272] eta: 0:23:08 lr: 3.4066710126032164e-06 sample/s: 16.430757056702372 loss: 0.1580 (0.1191) time: 0.2224 data: 0.0044 max mem: 12387
63
- 2021-05-22 18:51:21,874 INFO torchdistill.misc.log Epoch: [2] [ 7000/12272] eta: 0:19:27 lr: 2.8634289439374186e-06 sample/s: 18.773201563424898 loss: 0.0951 (0.1196) time: 0.2272 data: 0.0042 max mem: 12387
64
- 2021-05-22 18:55:04,803 INFO torchdistill.misc.log Epoch: [2] [ 8000/12272] eta: 0:15:46 lr: 2.320186875271621e-06 sample/s: 14.910151872387736 loss: 0.0195 (0.1192) time: 0.2168 data: 0.0042 max mem: 12387
65
- 2021-05-22 18:58:47,324 INFO torchdistill.misc.log Epoch: [2] [ 9000/12272] eta: 0:12:05 lr: 1.7769448066058238e-06 sample/s: 17.93071695225555 loss: 0.0373 (0.1192) time: 0.2179 data: 0.0042 max mem: 12387
66
- 2021-05-22 19:02:29,181 INFO torchdistill.misc.log Epoch: [2] [10000/12272] eta: 0:08:23 lr: 1.2337027379400262e-06 sample/s: 14.847908469314245 loss: 0.0573 (0.1188) time: 0.2322 data: 0.0042 max mem: 12387
67
- 2021-05-22 19:06:11,866 INFO torchdistill.misc.log Epoch: [2] [11000/12272] eta: 0:04:42 lr: 6.904606692742287e-07 sample/s: 14.793588146235695 loss: 0.0291 (0.1186) time: 0.2280 data: 0.0043 max mem: 12387
68
- 2021-05-22 19:09:52,653 INFO torchdistill.misc.log Epoch: [2] [12000/12272] eta: 0:01:00 lr: 1.4721860060843112e-07 sample/s: 20.479150035276856 loss: 0.0900 (0.1190) time: 0.2209 data: 0.0042 max mem: 12387
69
- 2021-05-22 19:10:51,967 INFO torchdistill.misc.log Epoch: [2] Total time: 0:45:20
70
- 2021-05-22 19:11:10,326 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
71
- 2021-05-22 19:11:10,326 INFO __main__ Validation: accuracy = 0.8541008660213958
72
- 2021-05-22 19:11:18,288 INFO __main__ [Student: bert-large-uncased]
73
- 2021-05-22 19:11:36,643 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
74
- 2021-05-22 19:11:36,643 INFO __main__ Test: accuracy = 0.8611309220580744
75
- 2021-05-22 19:11:36,644 INFO __main__ Start prediction for private dataset(s)
76
- 2021-05-22 19:11:36,645 INFO __main__ mnli/test_m: 9796 samples
77
- 2021-05-22 19:11:54,962 INFO __main__ mnli/test_mm: 9847 samples
78
- 2021-05-22 19:12:13,295 INFO __main__ ax/test_ax: 1104 samples
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-05-26 19:11:02,756 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_large_uncased.yaml', log='log/glue/mnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2
+ 2021-05-26 19:11:02,808 INFO __main__ Distributed environment: NO
3
  Num processes: 1
4
  Process index: 0
5
  Local process index: 0
6
  Device: cuda
7
  Use FP16 precision: True
8
 
9
+ 2021-05-26 19:11:33,729 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
10
+ 2021-05-26 19:12:20,927 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
11
+ 2021-05-26 19:12:22,683 INFO __main__ Start training
12
+ 2021-05-26 19:12:22,684 INFO torchdistill.models.util [student model]
13
+ 2021-05-26 19:12:22,684 INFO torchdistill.models.util Using the original student model
14
+ 2021-05-26 19:12:22,684 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
15
+ 2021-05-26 19:12:26,973 INFO torchdistill.misc.log Epoch: [0] [ 0/24544] eta: 2:31:32 lr: 1.999972837896567e-05 sample/s: 12.123115565180507 loss: 1.1665 (1.1665) time: 0.3704 data: 0.0405 max mem: 5355
16
+ 2021-05-26 19:15:03,514 INFO torchdistill.misc.log Epoch: [0] [ 1000/24544] eta: 1:01:30 lr: 1.972810734463277e-05 sample/s: 26.643611936032016 loss: 0.4966 (0.7978) time: 0.1565 data: 0.0028 max mem: 9034
17
+ 2021-05-26 19:17:40,135 INFO torchdistill.misc.log Epoch: [0] [ 2000/24544] eta: 0:58:52 lr: 1.9456486310299872e-05 sample/s: 27.679465456795217 loss: 0.5308 (0.6753) time: 0.1576 data: 0.0028 max mem: 9034
18
+ 2021-05-26 19:20:17,766 INFO torchdistill.misc.log Epoch: [0] [ 3000/24544] eta: 0:56:22 lr: 1.9184865275966974e-05 sample/s: 21.87914915370501 loss: 0.4498 (0.6280) time: 0.1605 data: 0.0028 max mem: 9034
19
+ 2021-05-26 19:22:54,312 INFO torchdistill.misc.log Epoch: [0] [ 4000/24544] eta: 0:53:43 lr: 1.8913244241634076e-05 sample/s: 24.011839028146316 loss: 0.3829 (0.5939) time: 0.1587 data: 0.0027 max mem: 9034
20
+ 2021-05-26 19:25:31,397 INFO torchdistill.misc.log Epoch: [0] [ 5000/24544] eta: 0:51:06 lr: 1.8641623207301177e-05 sample/s: 24.881084864807146 loss: 0.3964 (0.5686) time: 0.1602 data: 0.0027 max mem: 9034
21
+ 2021-05-26 19:28:08,506 INFO torchdistill.misc.log Epoch: [0] [ 6000/24544] eta: 0:48:30 lr: 1.8370002172968276e-05 sample/s: 27.22999367016701 loss: 0.3826 (0.5527) time: 0.1524 data: 0.0028 max mem: 9034
22
+ 2021-05-26 19:30:45,858 INFO torchdistill.misc.log Epoch: [0] [ 7000/24544] eta: 0:45:54 lr: 1.8098381138635377e-05 sample/s: 24.056011918161573 loss: 0.4032 (0.5388) time: 0.1597 data: 0.0028 max mem: 9034
23
+ 2021-05-26 19:33:22,940 INFO torchdistill.misc.log Epoch: [0] [ 8000/24544] eta: 0:43:17 lr: 1.782676010430248e-05 sample/s: 26.72731385749652 loss: 0.4228 (0.5270) time: 0.1555 data: 0.0027 max mem: 9034
24
+ 2021-05-26 19:36:00,181 INFO torchdistill.misc.log Epoch: [0] [ 9000/24544] eta: 0:40:41 lr: 1.755513906996958e-05 sample/s: 27.097306455494415 loss: 0.4902 (0.5164) time: 0.1585 data: 0.0026 max mem: 9034
25
+ 2021-05-26 19:38:37,048 INFO torchdistill.misc.log Epoch: [0] [10000/24544] eta: 0:38:03 lr: 1.7283518035636683e-05 sample/s: 24.899585185405055 loss: 0.3073 (0.5080) time: 0.1533 data: 0.0026 max mem: 9034
26
+ 2021-05-26 19:41:13,402 INFO torchdistill.misc.log Epoch: [0] [11000/24544] eta: 0:35:25 lr: 1.7011897001303784e-05 sample/s: 26.376325913302132 loss: 0.3966 (0.5002) time: 0.1545 data: 0.0026 max mem: 9034
27
+ 2021-05-26 19:43:50,316 INFO torchdistill.misc.log Epoch: [0] [12000/24544] eta: 0:32:48 lr: 1.6740275966970883e-05 sample/s: 27.519061506615184 loss: 0.4471 (0.4942) time: 0.1535 data: 0.0026 max mem: 9034
28
+ 2021-05-26 19:46:27,366 INFO torchdistill.misc.log Epoch: [0] [13000/24544] eta: 0:30:12 lr: 1.6468654932637984e-05 sample/s: 27.152036174196756 loss: 0.3334 (0.4879) time: 0.1564 data: 0.0027 max mem: 9034
29
+ 2021-05-26 19:49:04,444 INFO torchdistill.misc.log Epoch: [0] [14000/24544] eta: 0:27:35 lr: 1.6197033898305086e-05 sample/s: 26.7613347795572 loss: 0.3891 (0.4822) time: 0.1609 data: 0.0029 max mem: 9034
30
+ 2021-05-26 19:51:41,450 INFO torchdistill.misc.log Epoch: [0] [15000/24544] eta: 0:24:58 lr: 1.5925412863972188e-05 sample/s: 28.193590019359004 loss: 0.4292 (0.4766) time: 0.1557 data: 0.0026 max mem: 9034
31
+ 2021-05-26 19:54:18,472 INFO torchdistill.misc.log Epoch: [0] [16000/24544] eta: 0:22:21 lr: 1.565379182963929e-05 sample/s: 25.112961216638976 loss: 0.3714 (0.4715) time: 0.1548 data: 0.0027 max mem: 9034
32
+ 2021-05-26 19:56:55,577 INFO torchdistill.misc.log Epoch: [0] [17000/24544] eta: 0:19:44 lr: 1.538217079530639e-05 sample/s: 26.46332060948295 loss: 0.3461 (0.4665) time: 0.1560 data: 0.0027 max mem: 9034
33
+ 2021-05-26 19:59:31,974 INFO torchdistill.misc.log Epoch: [0] [18000/24544] eta: 0:17:07 lr: 1.5110549760973491e-05 sample/s: 27.044453435683003 loss: 0.2556 (0.4616) time: 0.1563 data: 0.0028 max mem: 9034
34
+ 2021-05-26 20:02:08,374 INFO torchdistill.misc.log Epoch: [0] [19000/24544] eta: 0:14:29 lr: 1.4838928726640591e-05 sample/s: 26.93417370769164 loss: 0.3405 (0.4576) time: 0.1580 data: 0.0026 max mem: 9034
35
+ 2021-05-26 20:04:45,080 INFO torchdistill.misc.log Epoch: [0] [20000/24544] eta: 0:11:53 lr: 1.4567307692307693e-05 sample/s: 27.49177977363858 loss: 0.3791 (0.4541) time: 0.1547 data: 0.0027 max mem: 9034
36
+ 2021-05-26 20:07:22,339 INFO torchdistill.misc.log Epoch: [0] [21000/24544] eta: 0:09:16 lr: 1.4295686657974795e-05 sample/s: 28.231116256451074 loss: 0.3786 (0.4511) time: 0.1622 data: 0.0028 max mem: 9034
37
+ 2021-05-26 20:09:58,871 INFO torchdistill.misc.log Epoch: [0] [22000/24544] eta: 0:06:39 lr: 1.4024065623641896e-05 sample/s: 27.339283357260534 loss: 0.4165 (0.4477) time: 0.1622 data: 0.0028 max mem: 9034
38
+ 2021-05-26 20:12:36,039 INFO torchdistill.misc.log Epoch: [0] [23000/24544] eta: 0:04:02 lr: 1.3752444589308998e-05 sample/s: 22.758457160065436 loss: 0.3802 (0.4446) time: 0.1555 data: 0.0027 max mem: 9034
39
+ 2021-05-26 20:15:12,598 INFO torchdistill.misc.log Epoch: [0] [24000/24544] eta: 0:01:25 lr: 1.34808235549761e-05 sample/s: 28.02765809938272 loss: 0.3125 (0.4414) time: 0.1585 data: 0.0027 max mem: 9034
40
+ 2021-05-26 20:16:38,263 INFO torchdistill.misc.log Epoch: [0] Total time: 1:04:11
41
+ 2021-05-26 20:16:58,591 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
42
+ 2021-05-26 20:16:58,592 INFO __main__ Validation: accuracy = 0.8665308201732043
43
+ 2021-05-26 20:16:58,592 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-large-uncased
44
+ 2021-05-26 20:17:03,627 INFO torchdistill.misc.log Epoch: [1] [ 0/24544] eta: 1:33:58 lr: 1.3333061712299002e-05 sample/s: 19.747310188972392 loss: 0.4682 (0.4682) time: 0.2297 data: 0.0271 max mem: 9034
45
+ 2021-05-26 20:19:40,585 INFO torchdistill.misc.log Epoch: [1] [ 1000/24544] eta: 1:01:37 lr: 1.3061440677966102e-05 sample/s: 26.252342839259867 loss: 0.1850 (0.2208) time: 0.1565 data: 0.0027 max mem: 9034
46
+ 2021-05-26 20:22:17,591 INFO torchdistill.misc.log Epoch: [1] [ 2000/24544] eta: 0:58:59 lr: 1.2789819643633204e-05 sample/s: 21.8097504982106 loss: 0.2099 (0.2356) time: 0.1602 data: 0.0027 max mem: 9034
47
+ 2021-05-26 20:24:54,769 INFO torchdistill.misc.log Epoch: [1] [ 3000/24544] eta: 0:56:23 lr: 1.2518198609300306e-05 sample/s: 27.91364301876747 loss: 0.2192 (0.2369) time: 0.1556 data: 0.0027 max mem: 9034
48
+ 2021-05-26 20:27:32,375 INFO torchdistill.misc.log Epoch: [1] [ 4000/24544] eta: 0:53:49 lr: 1.2246577574967407e-05 sample/s: 27.884924849457917 loss: 0.1388 (0.2382) time: 0.1618 data: 0.0028 max mem: 9034
49
+ 2021-05-26 20:30:08,985 INFO torchdistill.misc.log Epoch: [1] [ 5000/24544] eta: 0:51:10 lr: 1.1974956540634507e-05 sample/s: 25.346978857897394 loss: 0.2185 (0.2378) time: 0.1555 data: 0.0027 max mem: 9034
50
+ 2021-05-26 20:32:46,374 INFO torchdistill.misc.log Epoch: [1] [ 6000/24544] eta: 0:48:33 lr: 1.1703335506301609e-05 sample/s: 27.4796219041395 loss: 0.2571 (0.2396) time: 0.1560 data: 0.0027 max mem: 9034
51
+ 2021-05-26 20:35:22,973 INFO torchdistill.misc.log Epoch: [1] [ 7000/24544] eta: 0:45:55 lr: 1.143171447196871e-05 sample/s: 28.076955002476804 loss: 0.1621 (0.2406) time: 0.1579 data: 0.0027 max mem: 9034
52
+ 2021-05-26 20:38:00,044 INFO torchdistill.misc.log Epoch: [1] [ 8000/24544] eta: 0:43:18 lr: 1.116009343763581e-05 sample/s: 25.54037357852913 loss: 0.1671 (0.2414) time: 0.1615 data: 0.0027 max mem: 9034
53
+ 2021-05-26 20:40:36,918 INFO torchdistill.misc.log Epoch: [1] [ 9000/24544] eta: 0:40:40 lr: 1.0888472403302913e-05 sample/s: 24.698818735351963 loss: 0.2244 (0.2411) time: 0.1594 data: 0.0027 max mem: 9034
54
+ 2021-05-26 20:43:13,900 INFO torchdistill.misc.log Epoch: [1] [10000/24544] eta: 0:38:03 lr: 1.0616851368970014e-05 sample/s: 24.829091856245597 loss: 0.1934 (0.2437) time: 0.1562 data: 0.0027 max mem: 9034
55
+ 2021-05-26 20:45:50,810 INFO torchdistill.misc.log Epoch: [1] [11000/24544] eta: 0:35:26 lr: 1.0345230334637116e-05 sample/s: 26.688114023924662 loss: 0.2520 (0.2445) time: 0.1587 data: 0.0028 max mem: 9034
56
+ 2021-05-26 20:48:27,672 INFO torchdistill.misc.log Epoch: [1] [12000/24544] eta: 0:32:49 lr: 1.0073609300304216e-05 sample/s: 27.18719170312753 loss: 0.1216 (0.2450) time: 0.1556 data: 0.0026 max mem: 9034
57
+ 2021-05-26 20:51:04,476 INFO torchdistill.misc.log Epoch: [1] [13000/24544] eta: 0:30:12 lr: 9.801988265971318e-06 sample/s: 28.178247754435702 loss: 0.2546 (0.2451) time: 0.1543 data: 0.0026 max mem: 9034
58
+ 2021-05-26 20:53:41,206 INFO torchdistill.misc.log Epoch: [1] [14000/24544] eta: 0:27:35 lr: 9.53036723163842e-06 sample/s: 21.816443589673572 loss: 0.2472 (0.2448) time: 0.1608 data: 0.0028 max mem: 9034
59
+ 2021-05-26 20:56:18,004 INFO torchdistill.misc.log Epoch: [1] [15000/24544] eta: 0:24:58 lr: 9.258746197305521e-06 sample/s: 27.04789787804823 loss: 0.1441 (0.2442) time: 0.1567 data: 0.0027 max mem: 9034
60
+ 2021-05-26 20:58:54,796 INFO torchdistill.misc.log Epoch: [1] [16000/24544] eta: 0:22:20 lr: 8.987125162972621e-06 sample/s: 25.65468688595532 loss: 0.2517 (0.2445) time: 0.1564 data: 0.0027 max mem: 9034
61
+ 2021-05-26 21:01:32,093 INFO torchdistill.misc.log Epoch: [1] [17000/24544] eta: 0:19:44 lr: 8.715504128639723e-06 sample/s: 22.732769752540246 loss: 0.3479 (0.2443) time: 0.1581 data: 0.0027 max mem: 9034
62
+ 2021-05-26 21:04:08,873 INFO torchdistill.misc.log Epoch: [1] [18000/24544] eta: 0:17:07 lr: 8.443883094306825e-06 sample/s: 26.943862990686924 loss: 0.2213 (0.2436) time: 0.1575 data: 0.0028 max mem: 9034
63
+ 2021-05-26 21:06:45,412 INFO torchdistill.misc.log Epoch: [1] [19000/24544] eta: 0:14:30 lr: 8.172262059973926e-06 sample/s: 26.424266478925592 loss: 0.1826 (0.2434) time: 0.1576 data: 0.0027 max mem: 9034
64
+ 2021-05-26 21:09:22,506 INFO torchdistill.misc.log Epoch: [1] [20000/24544] eta: 0:11:53 lr: 7.900641025641026e-06 sample/s: 25.074415219317977 loss: 0.2069 (0.2432) time: 0.1558 data: 0.0027 max mem: 9034
65
+ 2021-05-26 21:11:58,927 INFO torchdistill.misc.log Epoch: [1] [21000/24544] eta: 0:09:16 lr: 7.629019991308127e-06 sample/s: 27.590701804876044 loss: 0.2668 (0.2434) time: 0.1545 data: 0.0027 max mem: 9034
66
+ 2021-05-26 21:14:35,504 INFO torchdistill.misc.log Epoch: [1] [22000/24544] eta: 0:06:39 lr: 7.357398956975229e-06 sample/s: 27.259238076978693 loss: 0.1533 (0.2432) time: 0.1531 data: 0.0027 max mem: 9034
67
+ 2021-05-26 21:17:13,344 INFO torchdistill.misc.log Epoch: [1] [23000/24544] eta: 0:04:02 lr: 7.08577792264233e-06 sample/s: 28.61345976737047 loss: 0.1621 (0.2431) time: 0.1588 data: 0.0026 max mem: 9034
68
+ 2021-05-26 21:19:50,219 INFO torchdistill.misc.log Epoch: [1] [24000/24544] eta: 0:01:25 lr: 6.8141568883094315e-06 sample/s: 24.86183830306838 loss: 0.2084 (0.2426) time: 0.1577 data: 0.0027 max mem: 9034
69
+ 2021-05-26 21:21:15,455 INFO torchdistill.misc.log Epoch: [1] Total time: 1:04:12
70
+ 2021-05-26 21:21:35,766 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
71
+ 2021-05-26 21:21:35,767 INFO __main__ Validation: accuracy = 0.866225165562914
72
+ 2021-05-26 21:21:35,944 INFO torchdistill.misc.log Epoch: [2] [ 0/24544] eta: 1:11:55 lr: 6.666395045632335e-06 sample/s: 26.641665713886454 loss: 0.1711 (0.1711) time: 0.1758 data: 0.0257 max mem: 9034
73
+ 2021-05-26 21:24:12,602 INFO torchdistill.misc.log Epoch: [2] [ 1000/24544] eta: 1:01:28 lr: 6.394774011299436e-06 sample/s: 25.478160772915167 loss: 0.0000 (0.2154) time: 0.1594 data: 0.0027 max mem: 9034
74
+ 2021-05-26 21:26:48,122 INFO torchdistill.misc.log Epoch: [2] [ 2000/24544] eta: 0:58:39 lr: 6.123152976966536e-06 sample/s: 29.191880596183893 loss: 0.0000 (0.2594) time: 0.1560 data: 0.0026 max mem: 9034
75
+ 2021-05-26 21:29:23,722 INFO torchdistill.misc.log Epoch: [2] [ 3000/24544] eta: 0:55:59 lr: 5.851531942633638e-06 sample/s: 23.31094851566039 loss: 0.0000 (0.2732) time: 0.1529 data: 0.0027 max mem: 9034
76
+ 2021-05-26 21:31:59,493 INFO torchdistill.misc.log Epoch: [2] [ 4000/24544] eta: 0:53:22 lr: 5.57991090830074e-06 sample/s: 28.10545818828756 loss: 0.0268 (0.2814) time: 0.1574 data: 0.0027 max mem: 9034
77
+ 2021-05-26 21:34:34,884 INFO torchdistill.misc.log Epoch: [2] [ 5000/24544] eta: 0:50:44 lr: 5.30828987396784e-06 sample/s: 26.16505434282533 loss: 0.0000 (0.2821) time: 0.1500 data: 0.0027 max mem: 9034
78
+ 2021-05-26 21:37:11,222 INFO torchdistill.misc.log Epoch: [2] [ 6000/24544] eta: 0:48:10 lr: 5.0366688396349415e-06 sample/s: 24.967990130203333 loss: 0.0000 (0.2841) time: 0.1545 data: 0.0027 max mem: 9034
79
+ 2021-05-26 21:39:46,971 INFO torchdistill.misc.log Epoch: [2] [ 7000/24544] eta: 0:45:34 lr: 4.765047805302043e-06 sample/s: 24.54743723489709 loss: 0.0181 (0.2855) time: 0.1525 data: 0.0027 max mem: 9034
80
+ 2021-05-26 21:42:22,158 INFO torchdistill.misc.log Epoch: [2] [ 8000/24544] eta: 0:42:57 lr: 4.493426770969144e-06 sample/s: 22.231136175716074 loss: 0.0000 (0.2868) time: 0.1544 data: 0.0027 max mem: 9034
81
+ 2021-05-26 21:44:57,381 INFO torchdistill.misc.log Epoch: [2] [ 9000/24544] eta: 0:40:20 lr: 4.221805736636245e-06 sample/s: 26.316582354924982 loss: 0.0000 (0.2887) time: 0.1524 data: 0.0026 max mem: 9034
82
+ 2021-05-26 21:47:33,414 INFO torchdistill.misc.log Epoch: [2] [10000/24544] eta: 0:37:45 lr: 3.950184702303347e-06 sample/s: 23.86255843954503 loss: 0.0003 (0.2882) time: 0.1609 data: 0.0028 max mem: 9034
83
+ 2021-05-26 21:50:08,383 INFO torchdistill.misc.log Epoch: [2] [11000/24544] eta: 0:35:08 lr: 3.678563667970448e-06 sample/s: 26.68255374373182 loss: 0.3676 (0.2855) time: 0.1545 data: 0.0027 max mem: 9034
84
+ 2021-05-26 21:52:43,285 INFO torchdistill.misc.log Epoch: [2] [12000/24544] eta: 0:32:31 lr: 3.4069426336375493e-06 sample/s: 25.459447811611312 loss: 0.0000 (0.2846) time: 0.1537 data: 0.0027 max mem: 9034
85
+ 2021-05-26 21:55:18,524 INFO torchdistill.misc.log Epoch: [2] [13000/24544] eta: 0:29:56 lr: 3.1353215993046506e-06 sample/s: 24.929554150538273 loss: 0.0000 (0.2846) time: 0.1595 data: 0.0027 max mem: 9034
86
+ 2021-05-26 21:57:55,270 INFO torchdistill.misc.log Epoch: [2] [14000/24544] eta: 0:27:21 lr: 2.8637005649717515e-06 sample/s: 27.516714503616473 loss: 0.1809 (0.2848) time: 0.1594 data: 0.0028 max mem: 9034
87
+ 2021-05-26 22:00:31,919 INFO torchdistill.misc.log Epoch: [2] [15000/24544] eta: 0:24:46 lr: 2.5920795306388528e-06 sample/s: 28.056718090221164 loss: 0.0959 (0.2857) time: 0.1554 data: 0.0027 max mem: 9034
88
+ 2021-05-26 22:03:07,727 INFO torchdistill.misc.log Epoch: [2] [16000/24544] eta: 0:22:10 lr: 2.320458496305954e-06 sample/s: 25.48872795003502 loss: 0.0000 (0.2860) time: 0.1578 data: 0.0028 max mem: 9034
89
+ 2021-05-26 22:05:43,590 INFO torchdistill.misc.log Epoch: [2] [17000/24544] eta: 0:19:34 lr: 2.0488374619730554e-06 sample/s: 24.52734053486662 loss: 0.0000 (0.2869) time: 0.1545 data: 0.0027 max mem: 9034
90
+ 2021-05-26 22:08:19,209 INFO torchdistill.misc.log Epoch: [2] [18000/24544] eta: 0:16:59 lr: 1.7772164276401565e-06 sample/s: 27.08741905120178 loss: 0.0000 (0.2869) time: 0.1550 data: 0.0027 max mem: 9034
91
+ 2021-05-26 22:10:53,843 INFO torchdistill.misc.log Epoch: [2] [19000/24544] eta: 0:14:23 lr: 1.5055953933072578e-06 sample/s: 28.067091144517644 loss: 0.0000 (0.2887) time: 0.1572 data: 0.0028 max mem: 9034
92
+ 2021-05-26 22:13:29,898 INFO torchdistill.misc.log Epoch: [2] [20000/24544] eta: 0:11:47 lr: 1.233974358974359e-06 sample/s: 25.48025028856084 loss: 0.0000 (0.2887) time: 0.1577 data: 0.0027 max mem: 9034
93
+ 2021-05-26 22:16:06,871 INFO torchdistill.misc.log Epoch: [2] [21000/24544] eta: 0:09:12 lr: 9.623533246414604e-07 sample/s: 27.475571670244978 loss: 0.0000 (0.2893) time: 0.1557 data: 0.0029 max mem: 9034
94
+ 2021-05-26 22:18:43,970 INFO torchdistill.misc.log Epoch: [2] [22000/24544] eta: 0:06:36 lr: 6.907322903085615e-07 sample/s: 25.952121373370375 loss: 0.0000 (0.2901) time: 0.1565 data: 0.0028 max mem: 9034
95
+ 2021-05-26 22:21:20,398 INFO torchdistill.misc.log Epoch: [2] [23000/24544] eta: 0:04:00 lr: 4.191112559756628e-07 sample/s: 28.05516983050394 loss: 0.0000 (0.2902) time: 0.1573 data: 0.0028 max mem: 9034
96
+ 2021-05-26 22:23:56,531 INFO torchdistill.misc.log Epoch: [2] [24000/24544] eta: 0:01:24 lr: 1.4749022164276403e-07 sample/s: 25.47750303335859 loss: 0.0000 (0.2901) time: 0.1553 data: 0.0027 max mem: 9034
97
+ 2021-05-26 22:25:21,372 INFO torchdistill.misc.log Epoch: [2] Total time: 1:03:45
98
+ 2021-05-26 22:25:41,707 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
99
+ 2021-05-26 22:25:41,708 INFO __main__ Validation: accuracy = 0.8601120733571065
100
+ 2021-05-26 22:25:51,217 INFO __main__ [Student: bert-large-uncased]
101
+ 2021-05-26 22:26:11,586 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
102
+ 2021-05-26 22:26:11,587 INFO __main__ Test: accuracy = 0.8665308201732043
103
+ 2021-05-26 22:26:11,587 INFO __main__ Start prediction for private dataset(s)
104
+ 2021-05-26 22:26:11,589 INFO __main__ mnli/test_m: 9796 samples
105
+ 2021-05-26 22:26:31,714 INFO __main__ mnli/test_mm: 9847 samples
106
+ 2021-05-26 22:26:51,852 INFO __main__ ax/test_ax: 1104 samples