yoshitomo-matsubara commited on
Commit
1d297c6
1 Parent(s): a590a9d

initial commit

Browse files
README.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - bert
5
+ - qnli
6
+ - glue
7
+ - torchdistill
8
+ license: apache-2.0
9
+ datasets:
10
+ - qnli
11
+ metrics:
12
+ - accuracy
13
+ ---
14
+
15
+ `bert-base-uncased` fine-tuned on QNLI dataset, using [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) and [Google Colab](https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/master/demo/glue_finetuning_and_submission.ipynb).
16
+ The hyperparameters are the same as those in Hugging Face's example and/or the paper of BERT, and the training configuration (including hyperparameters) is available [here](https://github.com/yoshitomo-matsubara/torchdistill/blob/main/configs/sample/glue/qnli/ce/bert_base_uncased.yaml).
17
+ I submitted prediction files to [the GLUE leaderboard](https://gluebenchmark.com/leaderboard), and the overall GLUE score was **77.9**.
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bert-base-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "finetuning_task": "qnli",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "problem_type": "single_label_classification",
22
+ "transformers_version": "4.6.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8c95beb104444d0202d8892e7de060cc789ca94e45a852f0754275f6f32d39b
3
+ size 438024457
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased"}
training.log ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-05-29 02:48:20,975 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml', log='log/glue/qnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2
+ 2021-05-29 02:48:21,006 INFO __main__ Distributed environment: NO
3
+ Num processes: 1
4
+ Process index: 0
5
+ Local process index: 0
6
+ Device: cuda
7
+ Use FP16 precision: True
8
+
9
+ 2021-05-29 02:48:25,491 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
10
+ 2021-05-29 02:48:39,570 INFO __main__ Start training
11
+ 2021-05-29 02:48:39,570 INFO torchdistill.models.util [student model]
12
+ 2021-05-29 02:48:39,570 INFO torchdistill.models.util Using the original student model
13
+ 2021-05-29 02:48:39,570 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
14
+ 2021-05-29 02:48:42,319 INFO torchdistill.misc.log Epoch: [0] [ 0/6547] eta: 0:22:25 lr: 4.999745430477064e-05 sample/s: 20.197233795892227 loss: 0.7056 (0.7056) time: 0.2056 data: 0.0075 max mem: 1855
15
+ 2021-05-29 02:50:10,157 INFO torchdistill.misc.log Epoch: [0] [ 500/6547] eta: 0:17:42 lr: 4.872460669008707e-05 sample/s: 21.980190963306118 loss: 0.3474 (0.4739) time: 0.1794 data: 0.0024 max mem: 3172
16
+ 2021-05-29 02:51:39,153 INFO torchdistill.misc.log Epoch: [0] [1000/6547] eta: 0:16:21 lr: 4.7451759075403496e-05 sample/s: 23.378806479707368 loss: 0.3126 (0.4271) time: 0.1734 data: 0.0025 max mem: 3172
17
+ 2021-05-29 02:53:07,257 INFO torchdistill.misc.log Epoch: [0] [1500/6547] eta: 0:14:51 lr: 4.6178911460719925e-05 sample/s: 23.41018861026731 loss: 0.2156 (0.3999) time: 0.1722 data: 0.0024 max mem: 3172
18
+ 2021-05-29 02:54:35,981 INFO torchdistill.misc.log Epoch: [0] [2000/6547] eta: 0:13:24 lr: 4.4906063846036354e-05 sample/s: 18.511809581395966 loss: 0.2507 (0.3813) time: 0.1826 data: 0.0024 max mem: 3172
19
+ 2021-05-29 02:56:05,545 INFO torchdistill.misc.log Epoch: [0] [2500/6547] eta: 0:11:57 lr: 4.3633216231352784e-05 sample/s: 21.954333167143645 loss: 0.2781 (0.3666) time: 0.1874 data: 0.0025 max mem: 3172
20
+ 2021-05-29 02:57:35,013 INFO torchdistill.misc.log Epoch: [0] [3000/6547] eta: 0:10:29 lr: 4.236036861666921e-05 sample/s: 20.494309991840005 loss: 0.3193 (0.3557) time: 0.1706 data: 0.0024 max mem: 3172
21
+ 2021-05-29 02:59:03,358 INFO torchdistill.misc.log Epoch: [0] [3500/6547] eta: 0:09:00 lr: 4.108752100198565e-05 sample/s: 25.07325354790115 loss: 0.3278 (0.3466) time: 0.1806 data: 0.0025 max mem: 3172
22
+ 2021-05-29 03:00:31,656 INFO torchdistill.misc.log Epoch: [0] [4000/6547] eta: 0:07:31 lr: 3.981467338730208e-05 sample/s: 24.452091889827816 loss: 0.2232 (0.3390) time: 0.1825 data: 0.0026 max mem: 3172
23
+ 2021-05-29 03:02:00,254 INFO torchdistill.misc.log Epoch: [0] [4500/6547] eta: 0:06:02 lr: 3.854182577261851e-05 sample/s: 29.69965444966861 loss: 0.2472 (0.3336) time: 0.1718 data: 0.0025 max mem: 3172
24
+ 2021-05-29 03:03:29,173 INFO torchdistill.misc.log Epoch: [0] [5000/6547] eta: 0:04:34 lr: 3.7268978157934936e-05 sample/s: 23.38610623626117 loss: 0.2710 (0.3283) time: 0.1763 data: 0.0026 max mem: 3172
25
+ 2021-05-29 03:04:57,593 INFO torchdistill.misc.log Epoch: [0] [5500/6547] eta: 0:03:05 lr: 3.5996130543251365e-05 sample/s: 21.977484404924404 loss: 0.2725 (0.3249) time: 0.1672 data: 0.0025 max mem: 3172
26
+ 2021-05-29 03:06:26,120 INFO torchdistill.misc.log Epoch: [0] [6000/6547] eta: 0:01:36 lr: 3.4723282928567794e-05 sample/s: 32.334515402880136 loss: 0.2182 (0.3205) time: 0.1820 data: 0.0025 max mem: 3172
27
+ 2021-05-29 03:07:54,877 INFO torchdistill.misc.log Epoch: [0] [6500/6547] eta: 0:00:08 lr: 3.345043531388422e-05 sample/s: 18.50968554791362 loss: 0.1868 (0.3170) time: 0.1836 data: 0.0024 max mem: 3172
28
+ 2021-05-29 03:08:02,846 INFO torchdistill.misc.log Epoch: [0] Total time: 0:19:20
29
+ 2021-05-29 03:08:22,659 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
30
+ 2021-05-29 03:08:22,660 INFO __main__ Validation: accuracy = 0.9104887424492037
31
+ 2021-05-29 03:08:22,660 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased
32
+ 2021-05-29 03:08:24,034 INFO torchdistill.misc.log Epoch: [1] [ 0/6547] eta: 0:18:11 lr: 3.3330787638103964e-05 sample/s: 24.98985791548809 loss: 0.1000 (0.1000) time: 0.1667 data: 0.0066 max mem: 3172
33
+ 2021-05-29 03:09:52,894 INFO torchdistill.misc.log Epoch: [1] [ 500/6547] eta: 0:17:54 lr: 3.20579400234204e-05 sample/s: 23.463095432887627 loss: 0.1836 (0.2297) time: 0.1741 data: 0.0023 max mem: 3172
34
+ 2021-05-29 03:11:19,875 INFO torchdistill.misc.log Epoch: [1] [1000/6547] eta: 0:16:15 lr: 3.078509240873683e-05 sample/s: 29.82499528907314 loss: 0.1741 (0.2197) time: 0.1664 data: 0.0022 max mem: 3172
35
+ 2021-05-29 03:12:48,699 INFO torchdistill.misc.log Epoch: [1] [1500/6547] eta: 0:14:50 lr: 2.9512244794053258e-05 sample/s: 22.036980458965402 loss: 0.1657 (0.2246) time: 0.1777 data: 0.0023 max mem: 3172
36
+ 2021-05-29 03:14:17,914 INFO torchdistill.misc.log Epoch: [1] [2000/6547] eta: 0:13:24 lr: 2.8239397179369687e-05 sample/s: 29.625550715245563 loss: 0.1794 (0.2179) time: 0.1702 data: 0.0025 max mem: 3172
37
+ 2021-05-29 03:15:47,019 INFO torchdistill.misc.log Epoch: [1] [2500/6547] eta: 0:11:57 lr: 2.6966549564686116e-05 sample/s: 25.09331686093034 loss: 0.2343 (0.2235) time: 0.1730 data: 0.0024 max mem: 3172
38
+ 2021-05-29 03:17:16,122 INFO torchdistill.misc.log Epoch: [1] [3000/6547] eta: 0:10:29 lr: 2.5693701950002545e-05 sample/s: 23.375223445979366 loss: 0.1512 (0.2240) time: 0.1806 data: 0.0026 max mem: 3172
39
+ 2021-05-29 03:18:44,097 INFO torchdistill.misc.log Epoch: [1] [3500/6547] eta: 0:08:59 lr: 2.4420854335318978e-05 sample/s: 25.041891678085157 loss: 0.2174 (0.2236) time: 0.1651 data: 0.0024 max mem: 3172
40
+ 2021-05-29 03:20:13,271 INFO torchdistill.misc.log Epoch: [1] [4000/6547] eta: 0:07:31 lr: 2.3148006720635407e-05 sample/s: 24.913154784178857 loss: 0.2000 (0.2243) time: 0.1819 data: 0.0024 max mem: 3172
41
+ 2021-05-29 03:21:41,587 INFO torchdistill.misc.log Epoch: [1] [4500/6547] eta: 0:06:02 lr: 2.1875159105951836e-05 sample/s: 27.05793099889041 loss: 0.1466 (0.2222) time: 0.1830 data: 0.0025 max mem: 3172
42
+ 2021-05-29 03:23:10,809 INFO torchdistill.misc.log Epoch: [1] [5000/6547] eta: 0:04:34 lr: 2.060231149126827e-05 sample/s: 25.019634276919955 loss: 0.1216 (0.2227) time: 0.1866 data: 0.0025 max mem: 3172
43
+ 2021-05-29 03:24:40,562 INFO torchdistill.misc.log Epoch: [1] [5500/6547] eta: 0:03:05 lr: 1.9329463876584698e-05 sample/s: 24.97914237942735 loss: 0.1274 (0.2230) time: 0.1777 data: 0.0026 max mem: 3172
44
+ 2021-05-29 03:26:09,896 INFO torchdistill.misc.log Epoch: [1] [6000/6547] eta: 0:01:37 lr: 1.8056616261901127e-05 sample/s: 19.578257759376193 loss: 0.2182 (0.2260) time: 0.1716 data: 0.0024 max mem: 3172
45
+ 2021-05-29 03:27:37,859 INFO torchdistill.misc.log Epoch: [1] [6500/6547] eta: 0:00:08 lr: 1.6783768647217556e-05 sample/s: 17.566110000104704 loss: 0.1400 (0.2247) time: 0.1825 data: 0.0025 max mem: 3172
46
+ 2021-05-29 03:27:45,754 INFO torchdistill.misc.log Epoch: [1] Total time: 0:19:21
47
+ 2021-05-29 03:28:05,670 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
48
+ 2021-05-29 03:28:05,670 INFO __main__ Validation: accuracy = 0.9157971810360608
49
+ 2021-05-29 03:28:05,670 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased
50
+ 2021-05-29 03:28:07,062 INFO torchdistill.misc.log Epoch: [2] [ 0/6547] eta: 0:19:23 lr: 1.66641209714373e-05 sample/s: 23.338836537043683 loss: 0.0048 (0.0048) time: 0.1778 data: 0.0064 max mem: 3172
51
+ 2021-05-29 03:29:36,066 INFO torchdistill.misc.log Epoch: [2] [ 500/6547] eta: 0:17:56 lr: 1.539127335675373e-05 sample/s: 24.923480536998497 loss: 0.0000 (0.1601) time: 0.1721 data: 0.0025 max mem: 3172
52
+ 2021-05-29 03:31:04,840 INFO torchdistill.misc.log Epoch: [2] [1000/6547] eta: 0:16:26 lr: 1.411842574207016e-05 sample/s: 20.586551487189556 loss: 0.0000 (0.1932) time: 0.1806 data: 0.0024 max mem: 3172
53
+ 2021-05-29 03:32:33,340 INFO torchdistill.misc.log Epoch: [2] [1500/6547] eta: 0:14:55 lr: 1.284557812738659e-05 sample/s: 27.066574171170444 loss: 0.0001 (0.2193) time: 0.1707 data: 0.0025 max mem: 3172
54
+ 2021-05-29 03:34:01,072 INFO torchdistill.misc.log Epoch: [2] [2000/6547] eta: 0:13:24 lr: 1.157273051270302e-05 sample/s: 23.399446576809783 loss: 0.0000 (0.2212) time: 0.1782 data: 0.0025 max mem: 3172
55
+ 2021-05-29 03:35:28,994 INFO torchdistill.misc.log Epoch: [2] [2500/6547] eta: 0:11:55 lr: 1.029988289801945e-05 sample/s: 26.919134529759564 loss: 0.0000 (0.2266) time: 0.1797 data: 0.0025 max mem: 3172
56
+ 2021-05-29 03:36:57,793 INFO torchdistill.misc.log Epoch: [2] [3000/6547] eta: 0:10:27 lr: 9.02703528333588e-06 sample/s: 30.056999534200436 loss: 0.0000 (0.2243) time: 0.1740 data: 0.0024 max mem: 3172
57
+ 2021-05-29 03:38:26,426 INFO torchdistill.misc.log Epoch: [2] [3500/6547] eta: 0:08:59 lr: 7.754187668652309e-06 sample/s: 18.62709103808307 loss: 0.0000 (0.2244) time: 0.1774 data: 0.0024 max mem: 3172
58
+ 2021-05-29 03:39:54,542 INFO torchdistill.misc.log Epoch: [2] [4000/6547] eta: 0:07:30 lr: 6.4813400539687385e-06 sample/s: 19.757705370559666 loss: 0.0000 (0.2285) time: 0.1680 data: 0.0024 max mem: 3172
59
+ 2021-05-29 03:41:23,407 INFO torchdistill.misc.log Epoch: [2] [4500/6547] eta: 0:06:02 lr: 5.208492439285169e-06 sample/s: 25.014672839368117 loss: 0.0000 (0.2261) time: 0.1812 data: 0.0025 max mem: 3172
60
+ 2021-05-29 03:42:51,296 INFO torchdistill.misc.log Epoch: [2] [5000/6547] eta: 0:04:33 lr: 3.935644824601599e-06 sample/s: 21.941957945013066 loss: 0.0000 (0.2248) time: 0.1673 data: 0.0024 max mem: 3172
61
+ 2021-05-29 03:44:19,640 INFO torchdistill.misc.log Epoch: [2] [5500/6547] eta: 0:03:05 lr: 2.662797209918029e-06 sample/s: 27.461809801138923 loss: 0.0000 (0.2209) time: 0.1728 data: 0.0024 max mem: 3172
62
+ 2021-05-29 03:45:47,660 INFO torchdistill.misc.log Epoch: [2] [6000/6547] eta: 0:01:36 lr: 1.3899495952344585e-06 sample/s: 17.73160173160173 loss: 0.0000 (0.2196) time: 0.1829 data: 0.0026 max mem: 3172
63
+ 2021-05-29 03:47:15,622 INFO torchdistill.misc.log Epoch: [2] [6500/6547] eta: 0:00:08 lr: 1.1710198055088846e-07 sample/s: 17.73863212240207 loss: 0.0000 (0.2180) time: 0.1774 data: 0.0025 max mem: 3172
64
+ 2021-05-29 03:47:23,808 INFO torchdistill.misc.log Epoch: [2] Total time: 0:19:16
65
+ 2021-05-29 03:47:43,628 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
66
+ 2021-05-29 03:47:43,629 INFO __main__ Validation: accuracy = 0.9115870400878638
67
+ 2021-05-29 03:47:47,015 INFO __main__ [Student: bert-base-uncased]
68
+ 2021-05-29 03:48:06,856 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
69
+ 2021-05-29 03:48:06,857 INFO __main__ Test: accuracy = 0.9157971810360608
70
+ 2021-05-29 03:48:06,857 INFO __main__ Start prediction for private dataset(s)
71
+ 2021-05-29 03:48:06,858 INFO __main__ qnli/test: 5463 samples
vocab.txt ADDED
The diff for this file is too large to render. See raw diff