yoshitomo-matsubara
commited on
Commit
•
38c02eb
1
Parent(s):
448e665
initial commit
Browse files- README.md +19 -0
- config.json +36 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- training.log +115 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- bert
|
5 |
+
- mnli
|
6 |
+
- ax
|
7 |
+
- glue
|
8 |
+
- torchdistill
|
9 |
+
license: apache-2.0
|
10 |
+
datasets:
|
11 |
+
- mnli
|
12 |
+
- ax
|
13 |
+
metrics:
|
14 |
+
- accuracy
|
15 |
+
---
|
16 |
+
|
17 |
+
`bert-base-uncased` fine-tuned on MNLI dataset, using [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) and [Google Colab](https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/master/demo/glue_finetuning_and_submission.ipynb).
|
18 |
+
The hyperparameters are the same as those in Hugging Face's example and/or the paper of BERT, and the training configuration (including hyperparameters) is available [here](https://github.com/yoshitomo-matsubara/torchdistill/blob/main/configs/sample/glue/mnli/ce/bert_base_uncased.yaml).
|
19 |
+
I submitted prediction files to [the GLUE leaderboard](https://gluebenchmark.com/leaderboard), and the overall GLUE score was **77.9**.
|
config.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "bert-base-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"finetuning_task": "mnli",
|
8 |
+
"gradient_checkpointing": false,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 768,
|
12 |
+
"id2label": {
|
13 |
+
"0": "LABEL_0",
|
14 |
+
"1": "LABEL_1",
|
15 |
+
"2": "LABEL_2"
|
16 |
+
},
|
17 |
+
"initializer_range": 0.02,
|
18 |
+
"intermediate_size": 3072,
|
19 |
+
"label2id": {
|
20 |
+
"LABEL_0": 0,
|
21 |
+
"LABEL_1": 1,
|
22 |
+
"LABEL_2": 2
|
23 |
+
},
|
24 |
+
"layer_norm_eps": 1e-12,
|
25 |
+
"max_position_embeddings": 512,
|
26 |
+
"model_type": "bert",
|
27 |
+
"num_attention_heads": 12,
|
28 |
+
"num_hidden_layers": 12,
|
29 |
+
"pad_token_id": 0,
|
30 |
+
"position_embedding_type": "absolute",
|
31 |
+
"problem_type": "single_label_classification",
|
32 |
+
"transformers_version": "4.6.1",
|
33 |
+
"type_vocab_size": 2,
|
34 |
+
"use_cache": true,
|
35 |
+
"vocab_size": 30522
|
36 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:536af63a9861b7821c240f73171a5e316ca9d52cbfc6131a5a10a0f852906826
|
3 |
+
size 438027529
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased"}
|
training.log
ADDED
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2021-05-29 15:29:04,310 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml', log='log/glue/mnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
|
2 |
+
2021-05-29 15:29:04,374 INFO __main__ Distributed environment: NO
|
3 |
+
Num processes: 1
|
4 |
+
Process index: 0
|
5 |
+
Local process index: 0
|
6 |
+
Device: cuda
|
7 |
+
Use FP16 precision: True
|
8 |
+
|
9 |
+
2021-05-29 15:29:04,728 INFO filelock Lock 139977050547728 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
|
10 |
+
2021-05-29 15:29:05,085 INFO filelock Lock 139977050547728 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
|
11 |
+
2021-05-29 15:29:05,785 INFO filelock Lock 139977045762832 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
|
12 |
+
2021-05-29 15:29:06,321 INFO filelock Lock 139977045762832 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
|
13 |
+
2021-05-29 15:29:06,668 INFO filelock Lock 139977045762832 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
|
14 |
+
2021-05-29 15:29:07,193 INFO filelock Lock 139977045762832 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
|
15 |
+
2021-05-29 15:29:08,239 INFO filelock Lock 139977012340816 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
|
16 |
+
2021-05-29 15:29:08,584 INFO filelock Lock 139977012340816 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
|
17 |
+
2021-05-29 15:29:08,962 INFO filelock Lock 139977044338768 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock
|
18 |
+
2021-05-29 15:29:16,242 INFO filelock Lock 139977044338768 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock
|
19 |
+
2021-05-29 15:30:57,737 INFO __main__ Start training
|
20 |
+
2021-05-29 15:30:57,738 INFO torchdistill.models.util [student model]
|
21 |
+
2021-05-29 15:30:57,738 INFO torchdistill.models.util Using the original student model
|
22 |
+
2021-05-29 15:30:57,738 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
|
23 |
+
2021-05-29 15:31:03,590 INFO torchdistill.misc.log Epoch: [0] [ 0/24544] eta: 2:05:27 lr: 1.999972837896567e-05 sample/s: 14.823441888738587 loss: 1.1098 (1.1098) time: 0.3067 data: 0.0369 max mem: 1893
|
24 |
+
2021-05-29 15:33:47,307 INFO torchdistill.misc.log Epoch: [0] [ 1000/24544] eta: 1:04:17 lr: 1.972810734463277e-05 sample/s: 27.213607114992513 loss: 0.7205 (0.9372) time: 0.1595 data: 0.0022 max mem: 3188
|
25 |
+
2021-05-29 15:36:31,223 INFO torchdistill.misc.log Epoch: [0] [ 2000/24544] eta: 1:01:34 lr: 1.9456486310299872e-05 sample/s: 23.541286875308696 loss: 0.5824 (0.7945) time: 0.1536 data: 0.0022 max mem: 3188
|
26 |
+
2021-05-29 15:39:15,330 INFO torchdistill.misc.log Epoch: [0] [ 3000/24544] eta: 0:58:52 lr: 1.9184865275966974e-05 sample/s: 23.52890622918265 loss: 0.5424 (0.7283) time: 0.1764 data: 0.0023 max mem: 3188
|
27 |
+
2021-05-29 15:41:58,885 INFO torchdistill.misc.log Epoch: [0] [ 4000/24544] eta: 0:56:06 lr: 1.8913244241634076e-05 sample/s: 17.631738191448555 loss: 0.5007 (0.6846) time: 0.1664 data: 0.0022 max mem: 3188
|
28 |
+
2021-05-29 15:44:43,385 INFO torchdistill.misc.log Epoch: [0] [ 5000/24544] eta: 0:53:24 lr: 1.8641623207301177e-05 sample/s: 23.523034105646886 loss: 0.4332 (0.6568) time: 0.1620 data: 0.0022 max mem: 3188
|
29 |
+
2021-05-29 15:47:29,027 INFO torchdistill.misc.log Epoch: [0] [ 6000/24544] eta: 0:50:46 lr: 1.8370002172968276e-05 sample/s: 25.141637744489802 loss: 0.4319 (0.6330) time: 0.1530 data: 0.0022 max mem: 3188
|
30 |
+
2021-05-29 15:50:15,261 INFO torchdistill.misc.log Epoch: [0] [ 7000/24544] eta: 0:48:06 lr: 1.8098381138635377e-05 sample/s: 27.210473391590924 loss: 0.4381 (0.6162) time: 0.1500 data: 0.0023 max mem: 3188
|
31 |
+
2021-05-29 15:53:00,429 INFO torchdistill.misc.log Epoch: [0] [ 8000/24544] eta: 0:45:23 lr: 1.782676010430248e-05 sample/s: 27.11552516360098 loss: 0.3911 (0.6018) time: 0.1653 data: 0.0023 max mem: 3188
|
32 |
+
2021-05-29 15:55:45,430 INFO torchdistill.misc.log Epoch: [0] [ 9000/24544] eta: 0:42:39 lr: 1.755513906996958e-05 sample/s: 27.143426172352147 loss: 0.4399 (0.5899) time: 0.1523 data: 0.0022 max mem: 3188
|
33 |
+
2021-05-29 15:58:29,704 INFO torchdistill.misc.log Epoch: [0] [10000/24544] eta: 0:39:54 lr: 1.7283518035636683e-05 sample/s: 23.528708244688683 loss: 0.5770 (0.5795) time: 0.1712 data: 0.0023 max mem: 3188
|
34 |
+
2021-05-29 16:01:14,615 INFO torchdistill.misc.log Epoch: [0] [11000/24544] eta: 0:37:10 lr: 1.7011897001303784e-05 sample/s: 23.51004667756884 loss: 0.4501 (0.5715) time: 0.1655 data: 0.0024 max mem: 3188
|
35 |
+
2021-05-29 16:03:59,094 INFO torchdistill.misc.log Epoch: [0] [12000/24544] eta: 0:34:25 lr: 1.6740275966970883e-05 sample/s: 27.174949544687372 loss: 0.4899 (0.5636) time: 0.1644 data: 0.0022 max mem: 3188
|
36 |
+
2021-05-29 16:06:43,208 INFO torchdistill.misc.log Epoch: [0] [13000/24544] eta: 0:31:40 lr: 1.6468654932637984e-05 sample/s: 25.122738503466554 loss: 0.4647 (0.5556) time: 0.1629 data: 0.0022 max mem: 3188
|
37 |
+
2021-05-29 16:09:28,363 INFO torchdistill.misc.log Epoch: [0] [14000/24544] eta: 0:28:55 lr: 1.6197033898305086e-05 sample/s: 29.783168119976143 loss: 0.4150 (0.5499) time: 0.1592 data: 0.0022 max mem: 3188
|
38 |
+
2021-05-29 16:12:13,821 INFO torchdistill.misc.log Epoch: [0] [15000/24544] eta: 0:26:11 lr: 1.5925412863972188e-05 sample/s: 25.158528026872208 loss: 0.4610 (0.5446) time: 0.1599 data: 0.0024 max mem: 3188
|
39 |
+
2021-05-29 16:14:58,895 INFO torchdistill.misc.log Epoch: [0] [16000/24544] eta: 0:23:27 lr: 1.565379182963929e-05 sample/s: 20.605590470236685 loss: 0.4390 (0.5395) time: 0.1720 data: 0.0024 max mem: 3188
|
40 |
+
2021-05-29 16:17:44,108 INFO torchdistill.misc.log Epoch: [0] [17000/24544] eta: 0:20:42 lr: 1.538217079530639e-05 sample/s: 23.374767502288403 loss: 0.5434 (0.5354) time: 0.1773 data: 0.0023 max mem: 3188
|
41 |
+
2021-05-29 16:20:29,013 INFO torchdistill.misc.log Epoch: [0] [18000/24544] eta: 0:17:58 lr: 1.5110549760973491e-05 sample/s: 25.141336338406393 loss: 0.4033 (0.5311) time: 0.1754 data: 0.0023 max mem: 3188
|
42 |
+
2021-05-29 16:23:12,810 INFO torchdistill.misc.log Epoch: [0] [19000/24544] eta: 0:15:13 lr: 1.4838928726640591e-05 sample/s: 27.220053378329048 loss: 0.3449 (0.5261) time: 0.1665 data: 0.0022 max mem: 3188
|
43 |
+
2021-05-29 16:25:57,100 INFO torchdistill.misc.log Epoch: [0] [20000/24544] eta: 0:12:28 lr: 1.4567307692307693e-05 sample/s: 27.172836906771014 loss: 0.3282 (0.5225) time: 0.1644 data: 0.0022 max mem: 3188
|
44 |
+
2021-05-29 16:28:40,766 INFO torchdistill.misc.log Epoch: [0] [21000/24544] eta: 0:09:43 lr: 1.4295686657974795e-05 sample/s: 29.776137866162625 loss: 0.4302 (0.5190) time: 0.1636 data: 0.0023 max mem: 3188
|
45 |
+
2021-05-29 16:31:25,444 INFO torchdistill.misc.log Epoch: [0] [22000/24544] eta: 0:06:58 lr: 1.4024065623641896e-05 sample/s: 25.15667954698467 loss: 0.3418 (0.5156) time: 0.1578 data: 0.0021 max mem: 3188
|
46 |
+
2021-05-29 16:34:10,137 INFO torchdistill.misc.log Epoch: [0] [23000/24544] eta: 0:04:14 lr: 1.3752444589308998e-05 sample/s: 32.650026272258444 loss: 0.3675 (0.5123) time: 0.1637 data: 0.0022 max mem: 3188
|
47 |
+
2021-05-29 16:36:54,197 INFO torchdistill.misc.log Epoch: [0] [24000/24544] eta: 0:01:29 lr: 1.34808235549761e-05 sample/s: 25.185947572110116 loss: 0.4129 (0.5095) time: 0.1622 data: 0.0023 max mem: 3188
|
48 |
+
2021-05-29 16:38:24,444 INFO torchdistill.misc.log Epoch: [0] Total time: 1:07:21
|
49 |
+
2021-05-29 16:38:55,360 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
50 |
+
2021-05-29 16:38:55,361 INFO __main__ Validation: accuracy = 0.8346408558329088
|
51 |
+
2021-05-29 16:38:55,361 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased
|
52 |
+
2021-05-29 16:38:56,551 INFO torchdistill.misc.log Epoch: [1] [ 0/24544] eta: 1:16:42 lr: 1.3333061712299002e-05 sample/s: 24.581569261785873 loss: 0.2703 (0.2703) time: 0.1875 data: 0.0248 max mem: 3188
|
53 |
+
2021-05-29 16:41:41,363 INFO torchdistill.misc.log Epoch: [1] [ 1000/24544] eta: 1:04:40 lr: 1.3061440677966102e-05 sample/s: 25.180428648616196 loss: 0.2192 (0.3276) time: 0.1543 data: 0.0022 max mem: 3188
|
54 |
+
2021-05-29 16:44:26,252 INFO torchdistill.misc.log Epoch: [1] [ 2000/24544] eta: 1:01:56 lr: 1.2789819643633204e-05 sample/s: 20.661468833319994 loss: 0.2764 (0.3302) time: 0.1713 data: 0.0023 max mem: 3188
|
55 |
+
2021-05-29 16:47:11,576 INFO torchdistill.misc.log Epoch: [1] [ 3000/24544] eta: 0:59:15 lr: 1.2518198609300306e-05 sample/s: 20.64476498224354 loss: 0.3651 (0.3332) time: 0.1719 data: 0.0022 max mem: 3188
|
56 |
+
2021-05-29 16:49:56,040 INFO torchdistill.misc.log Epoch: [1] [ 4000/24544] eta: 0:56:27 lr: 1.2246577574967407e-05 sample/s: 20.68329823903899 loss: 0.3764 (0.3347) time: 0.1674 data: 0.0022 max mem: 3188
|
57 |
+
2021-05-29 16:52:40,153 INFO torchdistill.misc.log Epoch: [1] [ 5000/24544] eta: 0:53:39 lr: 1.1974956540634507e-05 sample/s: 25.14510443394643 loss: 0.3584 (0.3372) time: 0.1650 data: 0.0023 max mem: 3188
|
58 |
+
2021-05-29 16:55:25,903 INFO torchdistill.misc.log Epoch: [1] [ 6000/24544] eta: 0:50:57 lr: 1.1703335506301609e-05 sample/s: 27.211797412017347 loss: 0.2562 (0.3376) time: 0.1659 data: 0.0022 max mem: 3188
|
59 |
+
2021-05-29 16:58:10,023 INFO torchdistill.misc.log Epoch: [1] [ 7000/24544] eta: 0:48:10 lr: 1.143171447196871e-05 sample/s: 29.83841542037708 loss: 0.2930 (0.3370) time: 0.1702 data: 0.0024 max mem: 3188
|
60 |
+
2021-05-29 17:00:54,079 INFO torchdistill.misc.log Epoch: [1] [ 8000/24544] eta: 0:45:24 lr: 1.116009343763581e-05 sample/s: 27.1218811935608 loss: 0.2333 (0.3370) time: 0.1612 data: 0.0022 max mem: 3188
|
61 |
+
2021-05-29 17:03:39,537 INFO torchdistill.misc.log Epoch: [1] [ 9000/24544] eta: 0:42:41 lr: 1.0888472403302913e-05 sample/s: 20.654448399014626 loss: 0.2945 (0.3364) time: 0.1665 data: 0.0022 max mem: 3188
|
62 |
+
2021-05-29 17:06:23,515 INFO torchdistill.misc.log Epoch: [1] [10000/24544] eta: 0:39:55 lr: 1.0616851368970014e-05 sample/s: 32.66127672663348 loss: 0.2922 (0.3369) time: 0.1590 data: 0.0022 max mem: 3188
|
63 |
+
2021-05-29 17:09:08,492 INFO torchdistill.misc.log Epoch: [1] [11000/24544] eta: 0:37:10 lr: 1.0345230334637116e-05 sample/s: 25.208653502300407 loss: 0.2785 (0.3363) time: 0.1561 data: 0.0021 max mem: 3188
|
64 |
+
2021-05-29 17:11:51,342 INFO torchdistill.misc.log Epoch: [1] [12000/24544] eta: 0:34:24 lr: 1.0073609300304216e-05 sample/s: 27.221422498555956 loss: 0.2517 (0.3361) time: 0.1569 data: 0.0022 max mem: 3188
|
65 |
+
2021-05-29 17:14:35,232 INFO torchdistill.misc.log Epoch: [1] [13000/24544] eta: 0:31:39 lr: 9.801988265971318e-06 sample/s: 29.87231117940427 loss: 0.2914 (0.3361) time: 0.1607 data: 0.0022 max mem: 3188
|
66 |
+
2021-05-29 17:17:19,532 INFO torchdistill.misc.log Epoch: [1] [14000/24544] eta: 0:28:54 lr: 9.53036723163842e-06 sample/s: 27.194418870028656 loss: 0.2757 (0.3361) time: 0.1520 data: 0.0022 max mem: 3188
|
67 |
+
2021-05-29 17:20:04,510 INFO torchdistill.misc.log Epoch: [1] [15000/24544] eta: 0:26:10 lr: 9.258746197305521e-06 sample/s: 25.20952471108667 loss: 0.2290 (0.3357) time: 0.1727 data: 0.0023 max mem: 3188
|
68 |
+
2021-05-29 17:22:48,546 INFO torchdistill.misc.log Epoch: [1] [16000/24544] eta: 0:23:25 lr: 8.987125162972621e-06 sample/s: 25.115630412620376 loss: 0.2612 (0.3352) time: 0.1640 data: 0.0022 max mem: 3188
|
69 |
+
2021-05-29 17:25:32,713 INFO torchdistill.misc.log Epoch: [1] [17000/24544] eta: 0:20:40 lr: 8.715504128639723e-06 sample/s: 27.229242372355948 loss: 0.2846 (0.3349) time: 0.1657 data: 0.0023 max mem: 3188
|
70 |
+
2021-05-29 17:28:16,077 INFO torchdistill.misc.log Epoch: [1] [18000/24544] eta: 0:17:55 lr: 8.443883094306825e-06 sample/s: 27.18763227406051 loss: 0.2757 (0.3350) time: 0.1631 data: 0.0022 max mem: 3188
|
71 |
+
2021-05-29 17:31:01,304 INFO torchdistill.misc.log Epoch: [1] [19000/24544] eta: 0:15:11 lr: 8.172262059973926e-06 sample/s: 25.188254140311948 loss: 0.2174 (0.3342) time: 0.1696 data: 0.0022 max mem: 3188
|
72 |
+
2021-05-29 17:33:45,948 INFO torchdistill.misc.log Epoch: [1] [20000/24544] eta: 0:12:27 lr: 7.900641025641026e-06 sample/s: 20.58210654418404 loss: 0.3514 (0.3342) time: 0.1772 data: 0.0025 max mem: 3188
|
73 |
+
2021-05-29 17:36:30,052 INFO torchdistill.misc.log Epoch: [1] [21000/24544] eta: 0:09:42 lr: 7.629019991308127e-06 sample/s: 25.193322511521327 loss: 0.2484 (0.3337) time: 0.1620 data: 0.0021 max mem: 3188
|
74 |
+
2021-05-29 17:39:12,159 INFO torchdistill.misc.log Epoch: [1] [22000/24544] eta: 0:06:58 lr: 7.357398956975229e-06 sample/s: 17.588355792435543 loss: 0.3947 (0.3335) time: 0.1706 data: 0.0022 max mem: 3188
|
75 |
+
2021-05-29 17:41:56,330 INFO torchdistill.misc.log Epoch: [1] [23000/24544] eta: 0:04:13 lr: 7.08577792264233e-06 sample/s: 25.18526702614418 loss: 0.2381 (0.3328) time: 0.1535 data: 0.0021 max mem: 3188
|
76 |
+
2021-05-29 17:44:38,298 INFO torchdistill.misc.log Epoch: [1] [24000/24544] eta: 0:01:29 lr: 6.8141568883094315e-06 sample/s: 23.51261665412359 loss: 0.2045 (0.3325) time: 0.1640 data: 0.0022 max mem: 3188
|
77 |
+
2021-05-29 17:46:07,356 INFO torchdistill.misc.log Epoch: [1] Total time: 1:07:10
|
78 |
+
2021-05-29 17:46:38,288 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
79 |
+
2021-05-29 17:46:38,288 INFO __main__ Validation: accuracy = 0.8419765664798777
|
80 |
+
2021-05-29 17:46:38,288 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased
|
81 |
+
2021-05-29 17:46:39,758 INFO torchdistill.misc.log Epoch: [2] [ 0/24544] eta: 1:16:03 lr: 6.666395045632335e-06 sample/s: 25.03165433277632 loss: 0.1728 (0.1728) time: 0.1859 data: 0.0261 max mem: 3188
|
82 |
+
2021-05-29 17:49:25,362 INFO torchdistill.misc.log Epoch: [2] [ 1000/24544] eta: 1:04:59 lr: 6.394774011299436e-06 sample/s: 23.490921310557265 loss: 0.1058 (0.2018) time: 0.1615 data: 0.0023 max mem: 3188
|
83 |
+
2021-05-29 17:52:10,171 INFO torchdistill.misc.log Epoch: [2] [ 2000/24544] eta: 1:02:04 lr: 6.123152976966536e-06 sample/s: 25.18507799212498 loss: 0.1258 (0.2032) time: 0.1576 data: 0.0024 max mem: 3188
|
84 |
+
2021-05-29 17:54:53,689 INFO torchdistill.misc.log Epoch: [2] [ 3000/24544] eta: 0:59:07 lr: 5.851531942633638e-06 sample/s: 22.06454226231966 loss: 0.1610 (0.2044) time: 0.1685 data: 0.0022 max mem: 3188
|
85 |
+
2021-05-29 17:57:38,820 INFO torchdistill.misc.log Epoch: [2] [ 4000/24544] eta: 0:56:25 lr: 5.57991090830074e-06 sample/s: 25.261603369174225 loss: 0.1056 (0.2069) time: 0.1639 data: 0.0025 max mem: 3188
|
86 |
+
2021-05-29 18:00:23,587 INFO torchdistill.misc.log Epoch: [2] [ 5000/24544] eta: 0:53:40 lr: 5.30828987396784e-06 sample/s: 29.636331197679574 loss: 0.0981 (0.2070) time: 0.1733 data: 0.0025 max mem: 3188
|
87 |
+
2021-05-29 18:03:08,981 INFO torchdistill.misc.log Epoch: [2] [ 6000/24544] eta: 0:50:57 lr: 5.0366688396349415e-06 sample/s: 25.148458989068047 loss: 0.1072 (0.2070) time: 0.1682 data: 0.0024 max mem: 3188
|
88 |
+
2021-05-29 18:05:55,250 INFO torchdistill.misc.log Epoch: [2] [ 7000/24544] eta: 0:48:16 lr: 4.765047805302043e-06 sample/s: 29.670923505994416 loss: 0.2394 (0.2083) time: 0.1682 data: 0.0025 max mem: 3188
|
89 |
+
2021-05-29 18:08:39,572 INFO torchdistill.misc.log Epoch: [2] [ 8000/24544] eta: 0:45:29 lr: 4.493426770969144e-06 sample/s: 27.09389317291776 loss: 0.2440 (0.2084) time: 0.1591 data: 0.0023 max mem: 3188
|
90 |
+
2021-05-29 18:11:24,579 INFO torchdistill.misc.log Epoch: [2] [ 9000/24544] eta: 0:42:44 lr: 4.221805736636245e-06 sample/s: 20.676721641200583 loss: 0.1225 (0.2089) time: 0.1663 data: 0.0022 max mem: 3188
|
91 |
+
2021-05-29 18:14:09,996 INFO torchdistill.misc.log Epoch: [2] [10000/24544] eta: 0:40:00 lr: 3.950184702303347e-06 sample/s: 25.05105260250164 loss: 0.1564 (0.2103) time: 0.1630 data: 0.0024 max mem: 3188
|
92 |
+
2021-05-29 18:16:54,735 INFO torchdistill.misc.log Epoch: [2] [11000/24544] eta: 0:37:14 lr: 3.678563667970448e-06 sample/s: 19.599001897143072 loss: 0.1674 (0.2102) time: 0.1689 data: 0.0024 max mem: 3188
|
93 |
+
2021-05-29 18:19:39,315 INFO torchdistill.misc.log Epoch: [2] [12000/24544] eta: 0:34:29 lr: 3.4069426336375493e-06 sample/s: 17.593557046979864 loss: 0.1158 (0.2100) time: 0.1817 data: 0.0025 max mem: 3188
|
94 |
+
2021-05-29 18:22:23,805 INFO torchdistill.misc.log Epoch: [2] [13000/24544] eta: 0:31:43 lr: 3.1353215993046506e-06 sample/s: 23.394160807859958 loss: 0.1052 (0.2098) time: 0.1561 data: 0.0023 max mem: 3188
|
95 |
+
2021-05-29 18:25:09,919 INFO torchdistill.misc.log Epoch: [2] [14000/24544] eta: 0:28:59 lr: 2.8637005649717515e-06 sample/s: 17.603229962773167 loss: 0.2102 (0.2101) time: 0.1706 data: 0.0024 max mem: 3188
|
96 |
+
2021-05-29 18:27:55,592 INFO torchdistill.misc.log Epoch: [2] [15000/24544] eta: 0:26:15 lr: 2.5920795306388528e-06 sample/s: 23.41979622233793 loss: 0.1816 (0.2104) time: 0.1779 data: 0.0024 max mem: 3188
|
97 |
+
2021-05-29 18:30:40,619 INFO torchdistill.misc.log Epoch: [2] [16000/24544] eta: 0:23:30 lr: 2.320458496305954e-06 sample/s: 25.04933208115658 loss: 0.0908 (0.2108) time: 0.1628 data: 0.0024 max mem: 3188
|
98 |
+
2021-05-29 18:33:26,626 INFO torchdistill.misc.log Epoch: [2] [17000/24544] eta: 0:20:45 lr: 2.0488374619730554e-06 sample/s: 17.581259752249633 loss: 0.1275 (0.2112) time: 0.1736 data: 0.0025 max mem: 3188
|
99 |
+
2021-05-29 18:36:10,854 INFO torchdistill.misc.log Epoch: [2] [18000/24544] eta: 0:18:00 lr: 1.7772164276401565e-06 sample/s: 23.390051304929734 loss: 0.0697 (0.2113) time: 0.1610 data: 0.0023 max mem: 3188
|
100 |
+
2021-05-29 18:38:57,109 INFO torchdistill.misc.log Epoch: [2] [19000/24544] eta: 0:15:15 lr: 1.5055953933072578e-06 sample/s: 27.079592996229533 loss: 0.1900 (0.2115) time: 0.1657 data: 0.0025 max mem: 3188
|
101 |
+
2021-05-29 18:41:43,430 INFO torchdistill.misc.log Epoch: [2] [20000/24544] eta: 0:12:30 lr: 1.233974358974359e-06 sample/s: 25.050229714980222 loss: 0.1741 (0.2123) time: 0.1638 data: 0.0025 max mem: 3188
|
102 |
+
2021-05-29 18:44:29,021 INFO torchdistill.misc.log Epoch: [2] [21000/24544] eta: 0:09:45 lr: 9.623533246414604e-07 sample/s: 27.045979346210515 loss: 0.2618 (0.2119) time: 0.1560 data: 0.0024 max mem: 3188
|
103 |
+
2021-05-29 18:47:12,810 INFO torchdistill.misc.log Epoch: [2] [22000/24544] eta: 0:07:00 lr: 6.907322903085615e-07 sample/s: 29.739929200841647 loss: 0.0538 (0.2119) time: 0.1495 data: 0.0024 max mem: 3188
|
104 |
+
2021-05-29 18:49:58,325 INFO torchdistill.misc.log Epoch: [2] [23000/24544] eta: 0:04:14 lr: 4.191112559756628e-07 sample/s: 25.124770125614745 loss: 0.0919 (0.2117) time: 0.1682 data: 0.0023 max mem: 3188
|
105 |
+
2021-05-29 18:52:43,792 INFO torchdistill.misc.log Epoch: [2] [24000/24544] eta: 0:01:29 lr: 1.4749022164276403e-07 sample/s: 27.05199747171807 loss: 0.0465 (0.2116) time: 0.1664 data: 0.0025 max mem: 3188
|
106 |
+
2021-05-29 18:54:13,503 INFO torchdistill.misc.log Epoch: [2] Total time: 1:07:33
|
107 |
+
2021-05-29 18:54:44,621 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
108 |
+
2021-05-29 18:54:44,622 INFO __main__ Validation: accuracy = 0.839429444727458
|
109 |
+
2021-05-29 18:54:48,420 INFO __main__ [Student: bert-base-uncased]
|
110 |
+
2021-05-29 18:55:19,412 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
111 |
+
2021-05-29 18:55:19,413 INFO __main__ Test: accuracy = 0.8419765664798777
|
112 |
+
2021-05-29 18:55:19,413 INFO __main__ Start prediction for private dataset(s)
|
113 |
+
2021-05-29 18:55:19,414 INFO __main__ mnli/test_m: 9796 samples
|
114 |
+
2021-05-29 18:55:50,074 INFO __main__ mnli/test_mm: 9847 samples
|
115 |
+
2021-05-29 18:56:20,968 INFO __main__ ax/test_ax: 1104 samples
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|