yoshitomo-matsubara commited on
Commit
38c02eb
1 Parent(s): 448e665

initial commit

Browse files
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - bert
5
+ - mnli
6
+ - ax
7
+ - glue
8
+ - torchdistill
9
+ license: apache-2.0
10
+ datasets:
11
+ - mnli
12
+ - ax
13
+ metrics:
14
+ - accuracy
15
+ ---
16
+
17
+ `bert-base-uncased` fine-tuned on MNLI dataset, using [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) and [Google Colab](https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/master/demo/glue_finetuning_and_submission.ipynb).
18
+ The hyperparameters are the same as those in Hugging Face's example and/or the paper of BERT, and the training configuration (including hyperparameters) is available [here](https://github.com/yoshitomo-matsubara/torchdistill/blob/main/configs/sample/glue/mnli/ce/bert_base_uncased.yaml).
19
+ I submitted prediction files to [the GLUE leaderboard](https://gluebenchmark.com/leaderboard), and the overall GLUE score was **77.9**.
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bert-base-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "finetuning_task": "mnli",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0",
14
+ "1": "LABEL_1",
15
+ "2": "LABEL_2"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "LABEL_0": 0,
21
+ "LABEL_1": 1,
22
+ "LABEL_2": 2
23
+ },
24
+ "layer_norm_eps": 1e-12,
25
+ "max_position_embeddings": 512,
26
+ "model_type": "bert",
27
+ "num_attention_heads": 12,
28
+ "num_hidden_layers": 12,
29
+ "pad_token_id": 0,
30
+ "position_embedding_type": "absolute",
31
+ "problem_type": "single_label_classification",
32
+ "transformers_version": "4.6.1",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:536af63a9861b7821c240f73171a5e316ca9d52cbfc6131a5a10a0f852906826
3
+ size 438027529
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased"}
training.log ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-05-29 15:29:04,310 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml', log='log/glue/mnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2
+ 2021-05-29 15:29:04,374 INFO __main__ Distributed environment: NO
3
+ Num processes: 1
4
+ Process index: 0
5
+ Local process index: 0
6
+ Device: cuda
7
+ Use FP16 precision: True
8
+
9
+ 2021-05-29 15:29:04,728 INFO filelock Lock 139977050547728 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
10
+ 2021-05-29 15:29:05,085 INFO filelock Lock 139977050547728 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
11
+ 2021-05-29 15:29:05,785 INFO filelock Lock 139977045762832 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
12
+ 2021-05-29 15:29:06,321 INFO filelock Lock 139977045762832 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
13
+ 2021-05-29 15:29:06,668 INFO filelock Lock 139977045762832 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
14
+ 2021-05-29 15:29:07,193 INFO filelock Lock 139977045762832 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
15
+ 2021-05-29 15:29:08,239 INFO filelock Lock 139977012340816 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
16
+ 2021-05-29 15:29:08,584 INFO filelock Lock 139977012340816 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
17
+ 2021-05-29 15:29:08,962 INFO filelock Lock 139977044338768 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock
18
+ 2021-05-29 15:29:16,242 INFO filelock Lock 139977044338768 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock
19
+ 2021-05-29 15:30:57,737 INFO __main__ Start training
20
+ 2021-05-29 15:30:57,738 INFO torchdistill.models.util [student model]
21
+ 2021-05-29 15:30:57,738 INFO torchdistill.models.util Using the original student model
22
+ 2021-05-29 15:30:57,738 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
23
+ 2021-05-29 15:31:03,590 INFO torchdistill.misc.log Epoch: [0] [ 0/24544] eta: 2:05:27 lr: 1.999972837896567e-05 sample/s: 14.823441888738587 loss: 1.1098 (1.1098) time: 0.3067 data: 0.0369 max mem: 1893
24
+ 2021-05-29 15:33:47,307 INFO torchdistill.misc.log Epoch: [0] [ 1000/24544] eta: 1:04:17 lr: 1.972810734463277e-05 sample/s: 27.213607114992513 loss: 0.7205 (0.9372) time: 0.1595 data: 0.0022 max mem: 3188
25
+ 2021-05-29 15:36:31,223 INFO torchdistill.misc.log Epoch: [0] [ 2000/24544] eta: 1:01:34 lr: 1.9456486310299872e-05 sample/s: 23.541286875308696 loss: 0.5824 (0.7945) time: 0.1536 data: 0.0022 max mem: 3188
26
+ 2021-05-29 15:39:15,330 INFO torchdistill.misc.log Epoch: [0] [ 3000/24544] eta: 0:58:52 lr: 1.9184865275966974e-05 sample/s: 23.52890622918265 loss: 0.5424 (0.7283) time: 0.1764 data: 0.0023 max mem: 3188
27
+ 2021-05-29 15:41:58,885 INFO torchdistill.misc.log Epoch: [0] [ 4000/24544] eta: 0:56:06 lr: 1.8913244241634076e-05 sample/s: 17.631738191448555 loss: 0.5007 (0.6846) time: 0.1664 data: 0.0022 max mem: 3188
28
+ 2021-05-29 15:44:43,385 INFO torchdistill.misc.log Epoch: [0] [ 5000/24544] eta: 0:53:24 lr: 1.8641623207301177e-05 sample/s: 23.523034105646886 loss: 0.4332 (0.6568) time: 0.1620 data: 0.0022 max mem: 3188
29
+ 2021-05-29 15:47:29,027 INFO torchdistill.misc.log Epoch: [0] [ 6000/24544] eta: 0:50:46 lr: 1.8370002172968276e-05 sample/s: 25.141637744489802 loss: 0.4319 (0.6330) time: 0.1530 data: 0.0022 max mem: 3188
30
+ 2021-05-29 15:50:15,261 INFO torchdistill.misc.log Epoch: [0] [ 7000/24544] eta: 0:48:06 lr: 1.8098381138635377e-05 sample/s: 27.210473391590924 loss: 0.4381 (0.6162) time: 0.1500 data: 0.0023 max mem: 3188
31
+ 2021-05-29 15:53:00,429 INFO torchdistill.misc.log Epoch: [0] [ 8000/24544] eta: 0:45:23 lr: 1.782676010430248e-05 sample/s: 27.11552516360098 loss: 0.3911 (0.6018) time: 0.1653 data: 0.0023 max mem: 3188
32
+ 2021-05-29 15:55:45,430 INFO torchdistill.misc.log Epoch: [0] [ 9000/24544] eta: 0:42:39 lr: 1.755513906996958e-05 sample/s: 27.143426172352147 loss: 0.4399 (0.5899) time: 0.1523 data: 0.0022 max mem: 3188
33
+ 2021-05-29 15:58:29,704 INFO torchdistill.misc.log Epoch: [0] [10000/24544] eta: 0:39:54 lr: 1.7283518035636683e-05 sample/s: 23.528708244688683 loss: 0.5770 (0.5795) time: 0.1712 data: 0.0023 max mem: 3188
34
+ 2021-05-29 16:01:14,615 INFO torchdistill.misc.log Epoch: [0] [11000/24544] eta: 0:37:10 lr: 1.7011897001303784e-05 sample/s: 23.51004667756884 loss: 0.4501 (0.5715) time: 0.1655 data: 0.0024 max mem: 3188
35
+ 2021-05-29 16:03:59,094 INFO torchdistill.misc.log Epoch: [0] [12000/24544] eta: 0:34:25 lr: 1.6740275966970883e-05 sample/s: 27.174949544687372 loss: 0.4899 (0.5636) time: 0.1644 data: 0.0022 max mem: 3188
36
+ 2021-05-29 16:06:43,208 INFO torchdistill.misc.log Epoch: [0] [13000/24544] eta: 0:31:40 lr: 1.6468654932637984e-05 sample/s: 25.122738503466554 loss: 0.4647 (0.5556) time: 0.1629 data: 0.0022 max mem: 3188
37
+ 2021-05-29 16:09:28,363 INFO torchdistill.misc.log Epoch: [0] [14000/24544] eta: 0:28:55 lr: 1.6197033898305086e-05 sample/s: 29.783168119976143 loss: 0.4150 (0.5499) time: 0.1592 data: 0.0022 max mem: 3188
38
+ 2021-05-29 16:12:13,821 INFO torchdistill.misc.log Epoch: [0] [15000/24544] eta: 0:26:11 lr: 1.5925412863972188e-05 sample/s: 25.158528026872208 loss: 0.4610 (0.5446) time: 0.1599 data: 0.0024 max mem: 3188
39
+ 2021-05-29 16:14:58,895 INFO torchdistill.misc.log Epoch: [0] [16000/24544] eta: 0:23:27 lr: 1.565379182963929e-05 sample/s: 20.605590470236685 loss: 0.4390 (0.5395) time: 0.1720 data: 0.0024 max mem: 3188
40
+ 2021-05-29 16:17:44,108 INFO torchdistill.misc.log Epoch: [0] [17000/24544] eta: 0:20:42 lr: 1.538217079530639e-05 sample/s: 23.374767502288403 loss: 0.5434 (0.5354) time: 0.1773 data: 0.0023 max mem: 3188
41
+ 2021-05-29 16:20:29,013 INFO torchdistill.misc.log Epoch: [0] [18000/24544] eta: 0:17:58 lr: 1.5110549760973491e-05 sample/s: 25.141336338406393 loss: 0.4033 (0.5311) time: 0.1754 data: 0.0023 max mem: 3188
42
+ 2021-05-29 16:23:12,810 INFO torchdistill.misc.log Epoch: [0] [19000/24544] eta: 0:15:13 lr: 1.4838928726640591e-05 sample/s: 27.220053378329048 loss: 0.3449 (0.5261) time: 0.1665 data: 0.0022 max mem: 3188
43
+ 2021-05-29 16:25:57,100 INFO torchdistill.misc.log Epoch: [0] [20000/24544] eta: 0:12:28 lr: 1.4567307692307693e-05 sample/s: 27.172836906771014 loss: 0.3282 (0.5225) time: 0.1644 data: 0.0022 max mem: 3188
44
+ 2021-05-29 16:28:40,766 INFO torchdistill.misc.log Epoch: [0] [21000/24544] eta: 0:09:43 lr: 1.4295686657974795e-05 sample/s: 29.776137866162625 loss: 0.4302 (0.5190) time: 0.1636 data: 0.0023 max mem: 3188
45
+ 2021-05-29 16:31:25,444 INFO torchdistill.misc.log Epoch: [0] [22000/24544] eta: 0:06:58 lr: 1.4024065623641896e-05 sample/s: 25.15667954698467 loss: 0.3418 (0.5156) time: 0.1578 data: 0.0021 max mem: 3188
46
+ 2021-05-29 16:34:10,137 INFO torchdistill.misc.log Epoch: [0] [23000/24544] eta: 0:04:14 lr: 1.3752444589308998e-05 sample/s: 32.650026272258444 loss: 0.3675 (0.5123) time: 0.1637 data: 0.0022 max mem: 3188
47
+ 2021-05-29 16:36:54,197 INFO torchdistill.misc.log Epoch: [0] [24000/24544] eta: 0:01:29 lr: 1.34808235549761e-05 sample/s: 25.185947572110116 loss: 0.4129 (0.5095) time: 0.1622 data: 0.0023 max mem: 3188
48
+ 2021-05-29 16:38:24,444 INFO torchdistill.misc.log Epoch: [0] Total time: 1:07:21
49
+ 2021-05-29 16:38:55,360 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
50
+ 2021-05-29 16:38:55,361 INFO __main__ Validation: accuracy = 0.8346408558329088
51
+ 2021-05-29 16:38:55,361 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased
52
+ 2021-05-29 16:38:56,551 INFO torchdistill.misc.log Epoch: [1] [ 0/24544] eta: 1:16:42 lr: 1.3333061712299002e-05 sample/s: 24.581569261785873 loss: 0.2703 (0.2703) time: 0.1875 data: 0.0248 max mem: 3188
53
+ 2021-05-29 16:41:41,363 INFO torchdistill.misc.log Epoch: [1] [ 1000/24544] eta: 1:04:40 lr: 1.3061440677966102e-05 sample/s: 25.180428648616196 loss: 0.2192 (0.3276) time: 0.1543 data: 0.0022 max mem: 3188
54
+ 2021-05-29 16:44:26,252 INFO torchdistill.misc.log Epoch: [1] [ 2000/24544] eta: 1:01:56 lr: 1.2789819643633204e-05 sample/s: 20.661468833319994 loss: 0.2764 (0.3302) time: 0.1713 data: 0.0023 max mem: 3188
55
+ 2021-05-29 16:47:11,576 INFO torchdistill.misc.log Epoch: [1] [ 3000/24544] eta: 0:59:15 lr: 1.2518198609300306e-05 sample/s: 20.64476498224354 loss: 0.3651 (0.3332) time: 0.1719 data: 0.0022 max mem: 3188
56
+ 2021-05-29 16:49:56,040 INFO torchdistill.misc.log Epoch: [1] [ 4000/24544] eta: 0:56:27 lr: 1.2246577574967407e-05 sample/s: 20.68329823903899 loss: 0.3764 (0.3347) time: 0.1674 data: 0.0022 max mem: 3188
57
+ 2021-05-29 16:52:40,153 INFO torchdistill.misc.log Epoch: [1] [ 5000/24544] eta: 0:53:39 lr: 1.1974956540634507e-05 sample/s: 25.14510443394643 loss: 0.3584 (0.3372) time: 0.1650 data: 0.0023 max mem: 3188
58
+ 2021-05-29 16:55:25,903 INFO torchdistill.misc.log Epoch: [1] [ 6000/24544] eta: 0:50:57 lr: 1.1703335506301609e-05 sample/s: 27.211797412017347 loss: 0.2562 (0.3376) time: 0.1659 data: 0.0022 max mem: 3188
59
+ 2021-05-29 16:58:10,023 INFO torchdistill.misc.log Epoch: [1] [ 7000/24544] eta: 0:48:10 lr: 1.143171447196871e-05 sample/s: 29.83841542037708 loss: 0.2930 (0.3370) time: 0.1702 data: 0.0024 max mem: 3188
60
+ 2021-05-29 17:00:54,079 INFO torchdistill.misc.log Epoch: [1] [ 8000/24544] eta: 0:45:24 lr: 1.116009343763581e-05 sample/s: 27.1218811935608 loss: 0.2333 (0.3370) time: 0.1612 data: 0.0022 max mem: 3188
61
+ 2021-05-29 17:03:39,537 INFO torchdistill.misc.log Epoch: [1] [ 9000/24544] eta: 0:42:41 lr: 1.0888472403302913e-05 sample/s: 20.654448399014626 loss: 0.2945 (0.3364) time: 0.1665 data: 0.0022 max mem: 3188
62
+ 2021-05-29 17:06:23,515 INFO torchdistill.misc.log Epoch: [1] [10000/24544] eta: 0:39:55 lr: 1.0616851368970014e-05 sample/s: 32.66127672663348 loss: 0.2922 (0.3369) time: 0.1590 data: 0.0022 max mem: 3188
63
+ 2021-05-29 17:09:08,492 INFO torchdistill.misc.log Epoch: [1] [11000/24544] eta: 0:37:10 lr: 1.0345230334637116e-05 sample/s: 25.208653502300407 loss: 0.2785 (0.3363) time: 0.1561 data: 0.0021 max mem: 3188
64
+ 2021-05-29 17:11:51,342 INFO torchdistill.misc.log Epoch: [1] [12000/24544] eta: 0:34:24 lr: 1.0073609300304216e-05 sample/s: 27.221422498555956 loss: 0.2517 (0.3361) time: 0.1569 data: 0.0022 max mem: 3188
65
+ 2021-05-29 17:14:35,232 INFO torchdistill.misc.log Epoch: [1] [13000/24544] eta: 0:31:39 lr: 9.801988265971318e-06 sample/s: 29.87231117940427 loss: 0.2914 (0.3361) time: 0.1607 data: 0.0022 max mem: 3188
66
+ 2021-05-29 17:17:19,532 INFO torchdistill.misc.log Epoch: [1] [14000/24544] eta: 0:28:54 lr: 9.53036723163842e-06 sample/s: 27.194418870028656 loss: 0.2757 (0.3361) time: 0.1520 data: 0.0022 max mem: 3188
67
+ 2021-05-29 17:20:04,510 INFO torchdistill.misc.log Epoch: [1] [15000/24544] eta: 0:26:10 lr: 9.258746197305521e-06 sample/s: 25.20952471108667 loss: 0.2290 (0.3357) time: 0.1727 data: 0.0023 max mem: 3188
68
+ 2021-05-29 17:22:48,546 INFO torchdistill.misc.log Epoch: [1] [16000/24544] eta: 0:23:25 lr: 8.987125162972621e-06 sample/s: 25.115630412620376 loss: 0.2612 (0.3352) time: 0.1640 data: 0.0022 max mem: 3188
69
+ 2021-05-29 17:25:32,713 INFO torchdistill.misc.log Epoch: [1] [17000/24544] eta: 0:20:40 lr: 8.715504128639723e-06 sample/s: 27.229242372355948 loss: 0.2846 (0.3349) time: 0.1657 data: 0.0023 max mem: 3188
70
+ 2021-05-29 17:28:16,077 INFO torchdistill.misc.log Epoch: [1] [18000/24544] eta: 0:17:55 lr: 8.443883094306825e-06 sample/s: 27.18763227406051 loss: 0.2757 (0.3350) time: 0.1631 data: 0.0022 max mem: 3188
71
+ 2021-05-29 17:31:01,304 INFO torchdistill.misc.log Epoch: [1] [19000/24544] eta: 0:15:11 lr: 8.172262059973926e-06 sample/s: 25.188254140311948 loss: 0.2174 (0.3342) time: 0.1696 data: 0.0022 max mem: 3188
72
+ 2021-05-29 17:33:45,948 INFO torchdistill.misc.log Epoch: [1] [20000/24544] eta: 0:12:27 lr: 7.900641025641026e-06 sample/s: 20.58210654418404 loss: 0.3514 (0.3342) time: 0.1772 data: 0.0025 max mem: 3188
73
+ 2021-05-29 17:36:30,052 INFO torchdistill.misc.log Epoch: [1] [21000/24544] eta: 0:09:42 lr: 7.629019991308127e-06 sample/s: 25.193322511521327 loss: 0.2484 (0.3337) time: 0.1620 data: 0.0021 max mem: 3188
74
+ 2021-05-29 17:39:12,159 INFO torchdistill.misc.log Epoch: [1] [22000/24544] eta: 0:06:58 lr: 7.357398956975229e-06 sample/s: 17.588355792435543 loss: 0.3947 (0.3335) time: 0.1706 data: 0.0022 max mem: 3188
75
+ 2021-05-29 17:41:56,330 INFO torchdistill.misc.log Epoch: [1] [23000/24544] eta: 0:04:13 lr: 7.08577792264233e-06 sample/s: 25.18526702614418 loss: 0.2381 (0.3328) time: 0.1535 data: 0.0021 max mem: 3188
76
+ 2021-05-29 17:44:38,298 INFO torchdistill.misc.log Epoch: [1] [24000/24544] eta: 0:01:29 lr: 6.8141568883094315e-06 sample/s: 23.51261665412359 loss: 0.2045 (0.3325) time: 0.1640 data: 0.0022 max mem: 3188
77
+ 2021-05-29 17:46:07,356 INFO torchdistill.misc.log Epoch: [1] Total time: 1:07:10
78
+ 2021-05-29 17:46:38,288 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
79
+ 2021-05-29 17:46:38,288 INFO __main__ Validation: accuracy = 0.8419765664798777
80
+ 2021-05-29 17:46:38,288 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased
81
+ 2021-05-29 17:46:39,758 INFO torchdistill.misc.log Epoch: [2] [ 0/24544] eta: 1:16:03 lr: 6.666395045632335e-06 sample/s: 25.03165433277632 loss: 0.1728 (0.1728) time: 0.1859 data: 0.0261 max mem: 3188
82
+ 2021-05-29 17:49:25,362 INFO torchdistill.misc.log Epoch: [2] [ 1000/24544] eta: 1:04:59 lr: 6.394774011299436e-06 sample/s: 23.490921310557265 loss: 0.1058 (0.2018) time: 0.1615 data: 0.0023 max mem: 3188
83
+ 2021-05-29 17:52:10,171 INFO torchdistill.misc.log Epoch: [2] [ 2000/24544] eta: 1:02:04 lr: 6.123152976966536e-06 sample/s: 25.18507799212498 loss: 0.1258 (0.2032) time: 0.1576 data: 0.0024 max mem: 3188
84
+ 2021-05-29 17:54:53,689 INFO torchdistill.misc.log Epoch: [2] [ 3000/24544] eta: 0:59:07 lr: 5.851531942633638e-06 sample/s: 22.06454226231966 loss: 0.1610 (0.2044) time: 0.1685 data: 0.0022 max mem: 3188
85
+ 2021-05-29 17:57:38,820 INFO torchdistill.misc.log Epoch: [2] [ 4000/24544] eta: 0:56:25 lr: 5.57991090830074e-06 sample/s: 25.261603369174225 loss: 0.1056 (0.2069) time: 0.1639 data: 0.0025 max mem: 3188
86
+ 2021-05-29 18:00:23,587 INFO torchdistill.misc.log Epoch: [2] [ 5000/24544] eta: 0:53:40 lr: 5.30828987396784e-06 sample/s: 29.636331197679574 loss: 0.0981 (0.2070) time: 0.1733 data: 0.0025 max mem: 3188
87
+ 2021-05-29 18:03:08,981 INFO torchdistill.misc.log Epoch: [2] [ 6000/24544] eta: 0:50:57 lr: 5.0366688396349415e-06 sample/s: 25.148458989068047 loss: 0.1072 (0.2070) time: 0.1682 data: 0.0024 max mem: 3188
88
+ 2021-05-29 18:05:55,250 INFO torchdistill.misc.log Epoch: [2] [ 7000/24544] eta: 0:48:16 lr: 4.765047805302043e-06 sample/s: 29.670923505994416 loss: 0.2394 (0.2083) time: 0.1682 data: 0.0025 max mem: 3188
89
+ 2021-05-29 18:08:39,572 INFO torchdistill.misc.log Epoch: [2] [ 8000/24544] eta: 0:45:29 lr: 4.493426770969144e-06 sample/s: 27.09389317291776 loss: 0.2440 (0.2084) time: 0.1591 data: 0.0023 max mem: 3188
90
+ 2021-05-29 18:11:24,579 INFO torchdistill.misc.log Epoch: [2] [ 9000/24544] eta: 0:42:44 lr: 4.221805736636245e-06 sample/s: 20.676721641200583 loss: 0.1225 (0.2089) time: 0.1663 data: 0.0022 max mem: 3188
91
+ 2021-05-29 18:14:09,996 INFO torchdistill.misc.log Epoch: [2] [10000/24544] eta: 0:40:00 lr: 3.950184702303347e-06 sample/s: 25.05105260250164 loss: 0.1564 (0.2103) time: 0.1630 data: 0.0024 max mem: 3188
92
+ 2021-05-29 18:16:54,735 INFO torchdistill.misc.log Epoch: [2] [11000/24544] eta: 0:37:14 lr: 3.678563667970448e-06 sample/s: 19.599001897143072 loss: 0.1674 (0.2102) time: 0.1689 data: 0.0024 max mem: 3188
93
+ 2021-05-29 18:19:39,315 INFO torchdistill.misc.log Epoch: [2] [12000/24544] eta: 0:34:29 lr: 3.4069426336375493e-06 sample/s: 17.593557046979864 loss: 0.1158 (0.2100) time: 0.1817 data: 0.0025 max mem: 3188
94
+ 2021-05-29 18:22:23,805 INFO torchdistill.misc.log Epoch: [2] [13000/24544] eta: 0:31:43 lr: 3.1353215993046506e-06 sample/s: 23.394160807859958 loss: 0.1052 (0.2098) time: 0.1561 data: 0.0023 max mem: 3188
95
+ 2021-05-29 18:25:09,919 INFO torchdistill.misc.log Epoch: [2] [14000/24544] eta: 0:28:59 lr: 2.8637005649717515e-06 sample/s: 17.603229962773167 loss: 0.2102 (0.2101) time: 0.1706 data: 0.0024 max mem: 3188
96
+ 2021-05-29 18:27:55,592 INFO torchdistill.misc.log Epoch: [2] [15000/24544] eta: 0:26:15 lr: 2.5920795306388528e-06 sample/s: 23.41979622233793 loss: 0.1816 (0.2104) time: 0.1779 data: 0.0024 max mem: 3188
97
+ 2021-05-29 18:30:40,619 INFO torchdistill.misc.log Epoch: [2] [16000/24544] eta: 0:23:30 lr: 2.320458496305954e-06 sample/s: 25.04933208115658 loss: 0.0908 (0.2108) time: 0.1628 data: 0.0024 max mem: 3188
98
+ 2021-05-29 18:33:26,626 INFO torchdistill.misc.log Epoch: [2] [17000/24544] eta: 0:20:45 lr: 2.0488374619730554e-06 sample/s: 17.581259752249633 loss: 0.1275 (0.2112) time: 0.1736 data: 0.0025 max mem: 3188
99
+ 2021-05-29 18:36:10,854 INFO torchdistill.misc.log Epoch: [2] [18000/24544] eta: 0:18:00 lr: 1.7772164276401565e-06 sample/s: 23.390051304929734 loss: 0.0697 (0.2113) time: 0.1610 data: 0.0023 max mem: 3188
100
+ 2021-05-29 18:38:57,109 INFO torchdistill.misc.log Epoch: [2] [19000/24544] eta: 0:15:15 lr: 1.5055953933072578e-06 sample/s: 27.079592996229533 loss: 0.1900 (0.2115) time: 0.1657 data: 0.0025 max mem: 3188
101
+ 2021-05-29 18:41:43,430 INFO torchdistill.misc.log Epoch: [2] [20000/24544] eta: 0:12:30 lr: 1.233974358974359e-06 sample/s: 25.050229714980222 loss: 0.1741 (0.2123) time: 0.1638 data: 0.0025 max mem: 3188
102
+ 2021-05-29 18:44:29,021 INFO torchdistill.misc.log Epoch: [2] [21000/24544] eta: 0:09:45 lr: 9.623533246414604e-07 sample/s: 27.045979346210515 loss: 0.2618 (0.2119) time: 0.1560 data: 0.0024 max mem: 3188
103
+ 2021-05-29 18:47:12,810 INFO torchdistill.misc.log Epoch: [2] [22000/24544] eta: 0:07:00 lr: 6.907322903085615e-07 sample/s: 29.739929200841647 loss: 0.0538 (0.2119) time: 0.1495 data: 0.0024 max mem: 3188
104
+ 2021-05-29 18:49:58,325 INFO torchdistill.misc.log Epoch: [2] [23000/24544] eta: 0:04:14 lr: 4.191112559756628e-07 sample/s: 25.124770125614745 loss: 0.0919 (0.2117) time: 0.1682 data: 0.0023 max mem: 3188
105
+ 2021-05-29 18:52:43,792 INFO torchdistill.misc.log Epoch: [2] [24000/24544] eta: 0:01:29 lr: 1.4749022164276403e-07 sample/s: 27.05199747171807 loss: 0.0465 (0.2116) time: 0.1664 data: 0.0025 max mem: 3188
106
+ 2021-05-29 18:54:13,503 INFO torchdistill.misc.log Epoch: [2] Total time: 1:07:33
107
+ 2021-05-29 18:54:44,621 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
108
+ 2021-05-29 18:54:44,622 INFO __main__ Validation: accuracy = 0.839429444727458
109
+ 2021-05-29 18:54:48,420 INFO __main__ [Student: bert-base-uncased]
110
+ 2021-05-29 18:55:19,412 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
111
+ 2021-05-29 18:55:19,413 INFO __main__ Test: accuracy = 0.8419765664798777
112
+ 2021-05-29 18:55:19,413 INFO __main__ Start prediction for private dataset(s)
113
+ 2021-05-29 18:55:19,414 INFO __main__ mnli/test_m: 9796 samples
114
+ 2021-05-29 18:55:50,074 INFO __main__ mnli/test_mm: 9847 samples
115
+ 2021-05-29 18:56:20,968 INFO __main__ ax/test_ax: 1104 samples
vocab.txt ADDED
The diff for this file is too large to render. See raw diff