yoshitomo-matsubara commited on
Commit
6ac7d79
1 Parent(s): 53800ce

initial commit

Browse files
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - bert
5
+ - qqp
6
+ - glue
7
+ - torchdistill
8
+ license: apache-2.0
9
+ datasets:
10
+ - qqp
11
+ metrics:
12
+ - f1
13
+ - accuracy
14
+ ---
15
+
16
+ `bert-base-uncased` fine-tuned on QQP dataset, using [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) and [Google Colab](https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/master/demo/glue_finetuning_and_submission.ipynb).
17
+ The hyperparameters are the same as those in Hugging Face's example and/or the paper of BERT, and the training configuration (including hyperparameters) is available [here](https://github.com/yoshitomo-matsubara/torchdistill/blob/main/configs/sample/glue/qqp/ce/bert_base_uncased.yaml).
18
+ I submitted prediction files to [the GLUE leaderboard](https://gluebenchmark.com/leaderboard), and the overall GLUE score was **77.9**.
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bert-base-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "finetuning_task": "qqp",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "problem_type": "single_label_classification",
22
+ "transformers_version": "4.6.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c4136b838ef446e2dfd4feba43e2a42849a0243c7115f5182046fd2dea581f6
3
+ size 438024457
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased"}
training.log ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-05-28 21:41:37,358 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml', log='log/glue/qqp/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1)
2
+ 2021-05-28 21:41:37,386 INFO __main__ Distributed environment: NO
3
+ Num processes: 1
4
+ Process index: 0
5
+ Local process index: 0
6
+ Device: cuda
7
+ Use FP16 precision: True
8
+
9
+ 2021-05-28 21:41:42,076 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/qqp/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
10
+ 2021-05-28 21:42:41,913 INFO __main__ Start training
11
+ 2021-05-28 21:42:41,913 INFO torchdistill.models.util [student model]
12
+ 2021-05-28 21:42:41,914 INFO torchdistill.models.util Using the original student model
13
+ 2021-05-28 21:42:41,914 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
14
+ 2021-05-28 21:42:44,608 INFO torchdistill.misc.log Epoch: [0] [ 0/22741] eta: 1:02:12 lr: 4.999926710933263e-05 sample/s: 28.09482152306569 loss: 0.5684 (0.5684) time: 0.1641 data: 0.0218 max mem: 1891
15
+ 2021-05-28 21:45:02,079 INFO torchdistill.misc.log Epoch: [0] [ 1000/22741] eta: 0:49:49 lr: 4.926637644196239e-05 sample/s: 22.046275952693826 loss: 0.3256 (0.4391) time: 0.1462 data: 0.0020 max mem: 3206
16
+ 2021-05-28 21:47:18,645 INFO torchdistill.misc.log Epoch: [0] [ 2000/22741] eta: 0:47:22 lr: 4.8533485774592146e-05 sample/s: 23.55741774597576 loss: 0.3501 (0.3987) time: 0.1345 data: 0.0020 max mem: 3206
17
+ 2021-05-28 21:49:34,213 INFO torchdistill.misc.log Epoch: [0] [ 3000/22741] eta: 0:44:55 lr: 4.780059510722191e-05 sample/s: 32.71120261889025 loss: 0.2823 (0.3815) time: 0.1361 data: 0.0019 max mem: 3206
18
+ 2021-05-28 21:51:49,521 INFO torchdistill.misc.log Epoch: [0] [ 4000/22741] eta: 0:42:33 lr: 4.706770443985167e-05 sample/s: 25.242523407373305 loss: 0.3737 (0.3665) time: 0.1319 data: 0.0019 max mem: 3206
19
+ 2021-05-28 21:54:05,811 INFO torchdistill.misc.log Epoch: [0] [ 5000/22741] eta: 0:40:17 lr: 4.633481377248142e-05 sample/s: 32.662548451970494 loss: 0.2040 (0.3526) time: 0.1379 data: 0.0021 max mem: 3206
20
+ 2021-05-28 21:56:21,633 INFO torchdistill.misc.log Epoch: [0] [ 6000/22741] eta: 0:37:59 lr: 4.560192310511118e-05 sample/s: 17.64687147306407 loss: 0.2610 (0.3427) time: 0.1330 data: 0.0019 max mem: 3206
21
+ 2021-05-28 21:58:37,550 INFO torchdistill.misc.log Epoch: [0] [ 7000/22741] eta: 0:35:42 lr: 4.4869032437740936e-05 sample/s: 39.795854661726544 loss: 0.2279 (0.3345) time: 0.1368 data: 0.0019 max mem: 3206
22
+ 2021-05-28 22:00:54,589 INFO torchdistill.misc.log Epoch: [0] [ 8000/22741] eta: 0:33:28 lr: 4.41361417703707e-05 sample/s: 29.80756185917765 loss: 0.3037 (0.3276) time: 0.1373 data: 0.0020 max mem: 3206
23
+ 2021-05-28 22:03:09,969 INFO torchdistill.misc.log Epoch: [0] [ 9000/22741] eta: 0:31:10 lr: 4.340325110300046e-05 sample/s: 31.53214914663003 loss: 0.1971 (0.3208) time: 0.1426 data: 0.0020 max mem: 3206
24
+ 2021-05-28 22:05:26,598 INFO torchdistill.misc.log Epoch: [0] [10000/22741] eta: 0:28:55 lr: 4.267036043563021e-05 sample/s: 32.66242127498029 loss: 0.2713 (0.3154) time: 0.1431 data: 0.0019 max mem: 3206
25
+ 2021-05-28 22:07:42,509 INFO torchdistill.misc.log Epoch: [0] [11000/22741] eta: 0:26:38 lr: 4.193746976825997e-05 sample/s: 39.87350538666844 loss: 0.2716 (0.3113) time: 0.1377 data: 0.0020 max mem: 3206
26
+ 2021-05-28 22:09:59,798 INFO torchdistill.misc.log Epoch: [0] [12000/22741] eta: 0:24:23 lr: 4.1204579100889726e-05 sample/s: 29.81990622411654 loss: 0.1884 (0.3089) time: 0.1430 data: 0.0020 max mem: 3206
27
+ 2021-05-28 22:12:15,437 INFO torchdistill.misc.log Epoch: [0] [13000/22741] eta: 0:22:06 lr: 4.047168843351949e-05 sample/s: 27.23596133085496 loss: 0.2587 (0.3061) time: 0.1314 data: 0.0020 max mem: 3206
28
+ 2021-05-28 22:14:30,715 INFO torchdistill.misc.log Epoch: [0] [14000/22741] eta: 0:19:50 lr: 3.973879776614925e-05 sample/s: 27.213695399343713 loss: 0.2133 (0.3033) time: 0.1350 data: 0.0019 max mem: 3206
29
+ 2021-05-28 22:16:46,940 INFO torchdistill.misc.log Epoch: [0] [15000/22741] eta: 0:17:33 lr: 3.9005907098779e-05 sample/s: 32.61448261114675 loss: 0.1779 (0.3003) time: 0.1352 data: 0.0019 max mem: 3206
30
+ 2021-05-28 22:19:01,833 INFO torchdistill.misc.log Epoch: [0] [16000/22741] eta: 0:15:17 lr: 3.827301643140876e-05 sample/s: 29.83268519160634 loss: 0.2660 (0.2970) time: 0.1310 data: 0.0020 max mem: 3206
31
+ 2021-05-28 22:21:16,379 INFO torchdistill.misc.log Epoch: [0] [17000/22741] eta: 0:13:00 lr: 3.754012576403852e-05 sample/s: 32.618984755190645 loss: 0.2141 (0.2938) time: 0.1298 data: 0.0019 max mem: 3206
32
+ 2021-05-28 22:23:31,336 INFO torchdistill.misc.log Epoch: [0] [18000/22741] eta: 0:10:44 lr: 3.680723509666828e-05 sample/s: 26.997843676178093 loss: 0.2866 (0.2914) time: 0.1386 data: 0.0020 max mem: 3206
33
+ 2021-05-28 22:25:46,116 INFO torchdistill.misc.log Epoch: [0] [19000/22741] eta: 0:08:28 lr: 3.607434442929804e-05 sample/s: 44.86558415163768 loss: 0.2532 (0.2893) time: 0.1327 data: 0.0020 max mem: 3206
34
+ 2021-05-28 22:28:02,199 INFO torchdistill.misc.log Epoch: [0] [20000/22741] eta: 0:06:12 lr: 3.53414537619278e-05 sample/s: 23.53072124231408 loss: 0.1963 (0.2868) time: 0.1398 data: 0.0019 max mem: 3206
35
+ 2021-05-28 22:30:17,943 INFO torchdistill.misc.log Epoch: [0] [21000/22741] eta: 0:03:56 lr: 3.460856309455756e-05 sample/s: 29.843086123508265 loss: 0.1552 (0.2842) time: 0.1418 data: 0.0019 max mem: 3206
36
+ 2021-05-28 22:32:32,428 INFO torchdistill.misc.log Epoch: [0] [22000/22741] eta: 0:01:40 lr: 3.387567242718731e-05 sample/s: 29.88486894254491 loss: 0.1958 (0.2824) time: 0.1364 data: 0.0019 max mem: 3206
37
+ 2021-05-28 22:34:12,553 INFO torchdistill.misc.log Epoch: [0] Total time: 0:51:28
38
+ 2021-05-28 22:35:52,473 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow
39
+ 2021-05-28 22:35:52,475 INFO __main__ Validation: accuracy = 0.9028938906752412, f1 = 0.8703006276841757
40
+ 2021-05-28 22:35:52,475 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased
41
+ 2021-05-28 22:35:53,740 INFO torchdistill.misc.log Epoch: [1] [ 0/22741] eta: 1:01:53 lr: 3.3332600442665964e-05 sample/s: 28.324724767312098 loss: 0.5533 (0.5533) time: 0.1633 data: 0.0221 max mem: 3206
42
+ 2021-05-28 22:38:09,983 INFO torchdistill.misc.log Epoch: [1] [ 1000/22741] eta: 0:49:22 lr: 3.2599709775295725e-05 sample/s: 39.881372451138404 loss: 0.0934 (0.1804) time: 0.1297 data: 0.0019 max mem: 3206
43
+ 2021-05-28 22:40:25,602 INFO torchdistill.misc.log Epoch: [1] [ 2000/22741] eta: 0:46:59 lr: 3.1866819107925486e-05 sample/s: 39.8993930861285 loss: 0.1000 (0.1880) time: 0.1340 data: 0.0019 max mem: 3206
44
+ 2021-05-28 22:42:40,851 INFO torchdistill.misc.log Epoch: [1] [ 3000/22741] eta: 0:44:39 lr: 3.113392844055524e-05 sample/s: 39.91201722353725 loss: 0.0397 (0.1897) time: 0.1337 data: 0.0020 max mem: 3206
45
+ 2021-05-28 22:44:56,820 INFO torchdistill.misc.log Epoch: [1] [ 4000/22741] eta: 0:42:24 lr: 3.0401037773184997e-05 sample/s: 35.83940938473304 loss: 0.2073 (0.1937) time: 0.1387 data: 0.0020 max mem: 3206
46
+ 2021-05-28 22:47:12,892 INFO torchdistill.misc.log Epoch: [1] [ 5000/22741] eta: 0:40:09 lr: 2.9668147105814757e-05 sample/s: 23.52620072946129 loss: 0.2317 (0.1947) time: 0.1377 data: 0.0020 max mem: 3206
47
+ 2021-05-28 22:49:29,022 INFO torchdistill.misc.log Epoch: [1] [ 6000/22741] eta: 0:37:54 lr: 2.8935256438444515e-05 sample/s: 35.71216075267673 loss: 0.1682 (0.1950) time: 0.1327 data: 0.0020 max mem: 3206
48
+ 2021-05-28 22:51:44,757 INFO torchdistill.misc.log Epoch: [1] [ 7000/22741] eta: 0:35:38 lr: 2.8202365771074275e-05 sample/s: 35.8185971639261 loss: 0.1936 (0.1957) time: 0.1277 data: 0.0020 max mem: 3206
49
+ 2021-05-28 22:54:00,779 INFO torchdistill.misc.log Epoch: [1] [ 8000/22741] eta: 0:33:23 lr: 2.746947510370403e-05 sample/s: 35.78253914764559 loss: 0.1607 (0.2021) time: 0.1355 data: 0.0020 max mem: 3206
50
+ 2021-05-28 22:56:16,445 INFO torchdistill.misc.log Epoch: [1] [ 9000/22741] eta: 0:31:06 lr: 2.673658443633379e-05 sample/s: 29.859710821758846 loss: 0.0592 (0.2015) time: 0.1354 data: 0.0020 max mem: 3206
51
+ 2021-05-28 22:58:32,285 INFO torchdistill.misc.log Epoch: [1] [10000/22741] eta: 0:28:50 lr: 2.600369376896355e-05 sample/s: 23.53702646320642 loss: 0.0289 (0.2013) time: 0.1486 data: 0.0020 max mem: 3206
52
+ 2021-05-28 23:00:47,277 INFO torchdistill.misc.log Epoch: [1] [11000/22741] eta: 0:26:34 lr: 2.5270803101593305e-05 sample/s: 32.66890856445197 loss: 0.0945 (0.2022) time: 0.1271 data: 0.0019 max mem: 3206
53
+ 2021-05-28 23:03:03,760 INFO torchdistill.misc.log Epoch: [1] [12000/22741] eta: 0:24:19 lr: 2.4537912434223065e-05 sample/s: 32.55891074505907 loss: 0.1768 (0.2016) time: 0.1379 data: 0.0020 max mem: 3206
54
+ 2021-05-28 23:05:18,319 INFO torchdistill.misc.log Epoch: [1] [13000/22741] eta: 0:22:02 lr: 2.3805021766852823e-05 sample/s: 32.3476698293464 loss: 0.0365 (0.2002) time: 0.1354 data: 0.0019 max mem: 3206
55
+ 2021-05-28 23:07:33,589 INFO torchdistill.misc.log Epoch: [1] [14000/22741] eta: 0:19:46 lr: 2.307213109948258e-05 sample/s: 29.607043339692197 loss: 0.0187 (0.2008) time: 0.1379 data: 0.0019 max mem: 3206
56
+ 2021-05-28 23:09:49,171 INFO torchdistill.misc.log Epoch: [1] [15000/22741] eta: 0:17:30 lr: 2.2339240432112337e-05 sample/s: 32.65828856186249 loss: 0.3250 (0.2023) time: 0.1350 data: 0.0019 max mem: 3206
57
+ 2021-05-28 23:12:04,128 INFO torchdistill.misc.log Epoch: [1] [16000/22741] eta: 0:15:14 lr: 2.1606349764742098e-05 sample/s: 39.902145036733664 loss: 0.1157 (0.2023) time: 0.1332 data: 0.0019 max mem: 3206
58
+ 2021-05-28 23:14:19,963 INFO torchdistill.misc.log Epoch: [1] [17000/22741] eta: 0:12:58 lr: 2.0873459097371855e-05 sample/s: 29.81842224062732 loss: 0.0740 (0.2035) time: 0.1371 data: 0.0019 max mem: 3206
59
+ 2021-05-28 23:16:35,045 INFO torchdistill.misc.log Epoch: [1] [18000/22741] eta: 0:10:43 lr: 2.0140568430001613e-05 sample/s: 27.17503757839631 loss: 0.1538 (0.2039) time: 0.1394 data: 0.0022 max mem: 3206
60
+ 2021-05-28 23:18:51,761 INFO torchdistill.misc.log Epoch: [1] [19000/22741] eta: 0:08:27 lr: 1.940767776263137e-05 sample/s: 29.899515255203877 loss: 0.2649 (0.2050) time: 0.1399 data: 0.0019 max mem: 3206
61
+ 2021-05-28 23:21:07,148 INFO torchdistill.misc.log Epoch: [1] [20000/22741] eta: 0:06:11 lr: 1.8674787095261127e-05 sample/s: 29.91145595618439 loss: 0.0765 (0.2059) time: 0.1264 data: 0.0019 max mem: 3206
62
+ 2021-05-28 23:23:23,180 INFO torchdistill.misc.log Epoch: [1] [21000/22741] eta: 0:03:56 lr: 1.7941896427890888e-05 sample/s: 35.81056949961473 loss: 0.0140 (0.2059) time: 0.1309 data: 0.0020 max mem: 3206
63
+ 2021-05-28 23:25:39,127 INFO torchdistill.misc.log Epoch: [1] [22000/22741] eta: 0:01:40 lr: 1.7209005760520645e-05 sample/s: 29.85492866014898 loss: 0.1697 (0.2061) time: 0.1355 data: 0.0020 max mem: 3206
64
+ 2021-05-28 23:27:19,524 INFO torchdistill.misc.log Epoch: [1] Total time: 0:51:25
65
+ 2021-05-28 23:28:59,456 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow
66
+ 2021-05-28 23:28:59,458 INFO __main__ Validation: accuracy = 0.9066782092505565, f1 = 0.8765904556307854
67
+ 2021-05-28 23:28:59,459 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased
68
+ 2021-05-28 23:29:00,762 INFO torchdistill.misc.log Epoch: [2] [ 0/22741] eta: 0:52:36 lr: 1.66659337759993e-05 sample/s: 35.41260205503162 loss: 0.0020 (0.0020) time: 0.1388 data: 0.0258 max mem: 3206
69
+ 2021-05-28 23:31:16,699 INFO torchdistill.misc.log Epoch: [2] [ 1000/22741] eta: 0:49:15 lr: 1.5933043108629057e-05 sample/s: 35.72493622530461 loss: 0.0000 (0.2282) time: 0.1340 data: 0.0020 max mem: 3206
70
+ 2021-05-28 23:33:33,394 INFO torchdistill.misc.log Epoch: [2] [ 2000/22741] eta: 0:47:07 lr: 1.5200152441258813e-05 sample/s: 30.27454801694068 loss: 0.0000 (0.2493) time: 0.1335 data: 0.0019 max mem: 3206
71
+ 2021-05-28 23:35:48,660 INFO torchdistill.misc.log Epoch: [2] [ 3000/22741] eta: 0:44:44 lr: 1.4467261773888572e-05 sample/s: 36.234792036525896 loss: 0.0000 (0.2656) time: 0.1393 data: 0.0019 max mem: 3206
72
+ 2021-05-28 23:38:03,495 INFO torchdistill.misc.log Epoch: [2] [ 4000/22741] eta: 0:42:22 lr: 1.373437110651833e-05 sample/s: 27.239410504986875 loss: 0.0000 (0.2649) time: 0.1332 data: 0.0019 max mem: 3206
73
+ 2021-05-28 23:40:18,370 INFO torchdistill.misc.log Epoch: [2] [ 5000/22741] eta: 0:40:04 lr: 1.300148043914809e-05 sample/s: 27.191730267732257 loss: 0.0001 (0.2653) time: 0.1463 data: 0.0020 max mem: 3206
74
+ 2021-05-28 23:42:33,360 INFO torchdistill.misc.log Epoch: [2] [ 6000/22741] eta: 0:37:47 lr: 1.2268589771777847e-05 sample/s: 32.66865411239648 loss: 0.0000 (0.2644) time: 0.1408 data: 0.0020 max mem: 3206
75
+ 2021-05-28 23:44:47,853 INFO torchdistill.misc.log Epoch: [2] [ 7000/22741] eta: 0:35:29 lr: 1.1535699104407605e-05 sample/s: 33.143780558875534 loss: 0.0001 (0.2670) time: 0.1333 data: 0.0020 max mem: 3206
76
+ 2021-05-28 23:47:02,473 INFO torchdistill.misc.log Epoch: [2] [ 8000/22741] eta: 0:33:13 lr: 1.0802808437037364e-05 sample/s: 32.695839261005986 loss: 0.0000 (0.2660) time: 0.1307 data: 0.0019 max mem: 3206
77
+ 2021-05-28 23:49:15,855 INFO torchdistill.misc.log Epoch: [2] [ 9000/22741] eta: 0:30:55 lr: 1.0069917769667121e-05 sample/s: 29.891577835226958 loss: 0.0000 (0.2636) time: 0.1350 data: 0.0020 max mem: 3206
78
+ 2021-05-28 23:51:30,840 INFO torchdistill.misc.log Epoch: [2] [10000/22741] eta: 0:28:40 lr: 9.33702710229688e-06 sample/s: 35.751504987501946 loss: 0.0000 (0.2629) time: 0.1275 data: 0.0021 max mem: 3206
79
+ 2021-05-28 23:53:45,522 INFO torchdistill.misc.log Epoch: [2] [11000/22741] eta: 0:26:24 lr: 8.604136434926637e-06 sample/s: 29.622255337481374 loss: 0.0000 (0.2632) time: 0.1303 data: 0.0020 max mem: 3206
80
+ 2021-05-28 23:55:59,507 INFO torchdistill.misc.log Epoch: [2] [12000/22741] eta: 0:24:08 lr: 7.871245767556396e-06 sample/s: 35.754933635673915 loss: 0.0000 (0.2629) time: 0.1290 data: 0.0019 max mem: 3206
81
+ 2021-05-28 23:58:15,335 INFO torchdistill.misc.log Epoch: [2] [13000/22741] eta: 0:21:54 lr: 7.1383551001861544e-06 sample/s: 35.821579784565124 loss: 0.0058 (0.2626) time: 0.1291 data: 0.0020 max mem: 3206
82
+ 2021-05-29 00:00:30,646 INFO torchdistill.misc.log Epoch: [2] [14000/22741] eta: 0:19:39 lr: 6.405464432815913e-06 sample/s: 33.14227467217681 loss: 0.0000 (0.2610) time: 0.1348 data: 0.0019 max mem: 3206
83
+ 2021-05-29 00:02:43,835 INFO torchdistill.misc.log Epoch: [2] [15000/22741] eta: 0:17:24 lr: 5.672573765445672e-06 sample/s: 27.24148927533408 loss: 0.0000 (0.2609) time: 0.1321 data: 0.0019 max mem: 3206
84
+ 2021-05-29 00:04:58,889 INFO torchdistill.misc.log Epoch: [2] [16000/22741] eta: 0:15:09 lr: 4.93968309807543e-06 sample/s: 35.784523504820406 loss: 0.0000 (0.2591) time: 0.1300 data: 0.0020 max mem: 3206
85
+ 2021-05-29 00:07:13,971 INFO torchdistill.misc.log Epoch: [2] [17000/22741] eta: 0:12:54 lr: 4.206792430705188e-06 sample/s: 29.886306308874037 loss: 0.0000 (0.2589) time: 0.1387 data: 0.0019 max mem: 3206
86
+ 2021-05-29 00:09:30,372 INFO torchdistill.misc.log Epoch: [2] [18000/22741] eta: 0:10:39 lr: 3.473901763334946e-06 sample/s: 32.35147476243367 loss: 0.0000 (0.2577) time: 0.1410 data: 0.0019 max mem: 3206
87
+ 2021-05-29 00:11:47,544 INFO torchdistill.misc.log Epoch: [2] [19000/22741] eta: 0:08:25 lr: 2.7410110959647043e-06 sample/s: 25.485669147804952 loss: 0.0000 (0.2556) time: 0.1318 data: 0.0019 max mem: 3206
88
+ 2021-05-29 00:14:02,842 INFO torchdistill.misc.log Epoch: [2] [20000/22741] eta: 0:06:10 lr: 2.0081204285944624e-06 sample/s: 35.68428418592087 loss: 0.0000 (0.2549) time: 0.1366 data: 0.0020 max mem: 3206
89
+ 2021-05-29 00:16:16,820 INFO torchdistill.misc.log Epoch: [2] [21000/22741] eta: 0:03:55 lr: 1.2752297612242206e-06 sample/s: 30.313933171800834 loss: 0.0000 (0.2551) time: 0.1303 data: 0.0019 max mem: 3206
90
+ 2021-05-29 00:18:32,152 INFO torchdistill.misc.log Epoch: [2] [22000/22741] eta: 0:01:40 lr: 5.423390938539788e-07 sample/s: 27.488987015114898 loss: 0.0000 (0.2537) time: 0.1327 data: 0.0019 max mem: 3206
91
+ 2021-05-29 00:20:12,904 INFO torchdistill.misc.log Epoch: [2] Total time: 0:51:12
92
+ 2021-05-29 00:21:52,921 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow
93
+ 2021-05-29 00:21:52,922 INFO __main__ Validation: accuracy = 0.9093742270591145, f1 = 0.8781833898530488
94
+ 2021-05-29 00:21:52,923 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased
95
+ 2021-05-29 00:21:57,493 INFO __main__ [Student: bert-base-uncased]
96
+ 2021-05-29 00:23:37,517 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow
97
+ 2021-05-29 00:23:37,519 INFO __main__ Test: accuracy = 0.9093742270591145, f1 = 0.8781833898530488
98
+ 2021-05-29 00:23:37,519 INFO __main__ Start prediction for private dataset(s)
99
+ 2021-05-29 00:23:37,520 INFO __main__ qqp/test: 390965 samples
vocab.txt ADDED
The diff for this file is too large to render. See raw diff