yoshitomo-matsubara
commited on
Commit
·
c94d09f
1
Parent(s):
14f76ca
added files
Browse files- config.json +26 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- training.log +51 -0
- vocab.txt +0 -0
config.json
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "bert-large-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"finetuning_task": "mrpc",
|
8 |
+
"gradient_checkpointing": false,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 1024,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 4096,
|
14 |
+
"layer_norm_eps": 1e-12,
|
15 |
+
"max_position_embeddings": 512,
|
16 |
+
"model_type": "bert",
|
17 |
+
"num_attention_heads": 16,
|
18 |
+
"num_hidden_layers": 24,
|
19 |
+
"pad_token_id": 0,
|
20 |
+
"position_embedding_type": "absolute",
|
21 |
+
"problem_type": "single_label_classification",
|
22 |
+
"transformers_version": "4.6.1",
|
23 |
+
"type_vocab_size": 2,
|
24 |
+
"use_cache": true,
|
25 |
+
"vocab_size": 30522
|
26 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b527b40cea6a770354120ce1112fe542240e4d050d540755f70fb0cf2da02592
|
3 |
+
size 1340746825
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-large-uncased"}
|
training.log
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2021-05-21 20:55:41,099 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_large_uncased.yaml', log='log/glue/mrpc/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
|
2 |
+
2021-05-21 20:55:41,134 INFO __main__ Distributed environment: NO
|
3 |
+
Num processes: 1
|
4 |
+
Process index: 0
|
5 |
+
Local process index: 0
|
6 |
+
Device: cuda
|
7 |
+
Use FP16 precision: True
|
8 |
+
|
9 |
+
2021-05-21 20:56:09,963 INFO __main__ Start training
|
10 |
+
2021-05-21 20:56:09,964 INFO torchdistill.models.util [student model]
|
11 |
+
2021-05-21 20:56:09,964 INFO torchdistill.models.util Using the original student model
|
12 |
+
2021-05-21 20:56:09,964 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
|
13 |
+
2021-05-21 20:56:13,340 INFO torchdistill.misc.log Epoch: [0] [ 0/115] eta: 0:01:32 lr: 1.996521739130435e-05 sample/s: 5.0108644488568075 loss: 0.7117 (0.7117) time: 0.8047 data: 0.0065 max mem: 5401
|
14 |
+
2021-05-21 20:56:57,890 INFO torchdistill.misc.log Epoch: [0] [ 50/115] eta: 0:00:57 lr: 1.822608695652174e-05 sample/s: 4.658316585970199 loss: 0.6187 (0.6438) time: 0.8854 data: 0.0047 max mem: 10945
|
15 |
+
2021-05-21 20:57:42,797 INFO torchdistill.misc.log Epoch: [0] [100/115] eta: 0:00:13 lr: 1.6486956521739132e-05 sample/s: 4.6675958893857725 loss: 0.6068 (0.6307) time: 0.8995 data: 0.0046 max mem: 10946
|
16 |
+
2021-05-21 20:57:55,065 INFO torchdistill.misc.log Epoch: [0] Total time: 0:01:42
|
17 |
+
2021-05-21 20:57:58,697 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
18 |
+
2021-05-21 20:57:58,698 INFO __main__ Validation: accuracy = 0.6911764705882353, f1 = 0.8141592920353982
|
19 |
+
2021-05-21 20:57:58,698 INFO __main__ Updating ckpt
|
20 |
+
2021-05-21 20:58:04,096 INFO torchdistill.misc.log Epoch: [1] [ 0/115] eta: 0:01:41 lr: 1.596521739130435e-05 sample/s: 4.550537109441219 loss: 0.6108 (0.6108) time: 0.8846 data: 0.0056 max mem: 10946
|
21 |
+
2021-05-21 20:58:48,505 INFO torchdistill.misc.log Epoch: [1] [ 50/115] eta: 0:00:57 lr: 1.4226086956521742e-05 sample/s: 5.072097714849413 loss: 0.5569 (0.5700) time: 0.9032 data: 0.0046 max mem: 10946
|
22 |
+
2021-05-21 20:59:33,555 INFO torchdistill.misc.log Epoch: [1] [100/115] eta: 0:00:13 lr: 1.2486956521739131e-05 sample/s: 4.004992031655668 loss: 0.5414 (0.5664) time: 0.8920 data: 0.0046 max mem: 10946
|
23 |
+
2021-05-21 20:59:45,989 INFO torchdistill.misc.log Epoch: [1] Total time: 0:01:42
|
24 |
+
2021-05-21 20:59:49,619 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
25 |
+
2021-05-21 20:59:49,620 INFO __main__ Validation: accuracy = 0.7524509803921569, f1 = 0.8378812199036919
|
26 |
+
2021-05-21 20:59:49,620 INFO __main__ Updating ckpt
|
27 |
+
2021-05-21 20:59:55,254 INFO torchdistill.misc.log Epoch: [2] [ 0/115] eta: 0:01:41 lr: 1.196521739130435e-05 sample/s: 4.561543612792821 loss: 0.4717 (0.4717) time: 0.8828 data: 0.0059 max mem: 10946
|
28 |
+
2021-05-21 21:00:39,815 INFO torchdistill.misc.log Epoch: [2] [ 50/115] eta: 0:00:57 lr: 1.022608695652174e-05 sample/s: 4.008234729448237 loss: 0.5461 (0.5267) time: 0.8850 data: 0.0046 max mem: 10946
|
29 |
+
2021-05-21 21:01:24,369 INFO torchdistill.misc.log Epoch: [2] [100/115] eta: 0:00:13 lr: 8.48695652173913e-06 sample/s: 5.080750089185291 loss: 0.4628 (0.5161) time: 0.8997 data: 0.0047 max mem: 10946
|
30 |
+
2021-05-21 21:01:36,665 INFO torchdistill.misc.log Epoch: [2] Total time: 0:01:42
|
31 |
+
2021-05-21 21:01:40,295 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
32 |
+
2021-05-21 21:01:40,295 INFO __main__ Validation: accuracy = 0.7696078431372549, f1 = 0.8469055374592835
|
33 |
+
2021-05-21 21:01:40,295 INFO __main__ Updating ckpt
|
34 |
+
2021-05-21 21:01:45,871 INFO torchdistill.misc.log Epoch: [3] [ 0/115] eta: 0:01:41 lr: 7.965217391304349e-06 sample/s: 4.552310192381945 loss: 0.4230 (0.4230) time: 0.8846 data: 0.0059 max mem: 10946
|
35 |
+
2021-05-21 21:02:30,881 INFO torchdistill.misc.log Epoch: [3] [ 50/115] eta: 0:00:58 lr: 6.226086956521739e-06 sample/s: 4.298450072186489 loss: 0.4338 (0.4609) time: 0.8961 data: 0.0048 max mem: 10946
|
36 |
+
2021-05-21 21:03:15,449 INFO torchdistill.misc.log Epoch: [3] [100/115] eta: 0:00:13 lr: 4.486956521739131e-06 sample/s: 4.298780486242516 loss: 0.4697 (0.4590) time: 0.8995 data: 0.0047 max mem: 10946
|
37 |
+
2021-05-21 21:03:27,486 INFO torchdistill.misc.log Epoch: [3] Total time: 0:01:42
|
38 |
+
2021-05-21 21:03:31,118 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
39 |
+
2021-05-21 21:03:31,118 INFO __main__ Validation: accuracy = 0.7622549019607843, f1 = 0.8186915887850468
|
40 |
+
2021-05-21 21:03:31,983 INFO torchdistill.misc.log Epoch: [4] [ 0/115] eta: 0:01:39 lr: 3.965217391304348e-06 sample/s: 4.657085579681301 loss: 0.3907 (0.3907) time: 0.8638 data: 0.0048 max mem: 10946
|
41 |
+
2021-05-21 21:04:17,067 INFO torchdistill.misc.log Epoch: [4] [ 50/115] eta: 0:00:58 lr: 2.2260869565217395e-06 sample/s: 4.657376462576989 loss: 0.3632 (0.3954) time: 0.9041 data: 0.0048 max mem: 10946
|
42 |
+
2021-05-21 21:05:02,138 INFO torchdistill.misc.log Epoch: [4] [100/115] eta: 0:00:13 lr: 4.869565217391305e-07 sample/s: 4.656771466962662 loss: 0.4057 (0.3937) time: 0.8889 data: 0.0047 max mem: 10946
|
43 |
+
2021-05-21 21:05:14,248 INFO torchdistill.misc.log Epoch: [4] Total time: 0:01:43
|
44 |
+
2021-05-21 21:05:17,878 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
45 |
+
2021-05-21 21:05:17,879 INFO __main__ Validation: accuracy = 0.7941176470588235, f1 = 0.8571428571428571
|
46 |
+
2021-05-21 21:05:17,879 INFO __main__ Updating ckpt
|
47 |
+
2021-05-21 21:05:28,554 INFO __main__ [Student: bert-large-uncased]
|
48 |
+
2021-05-21 21:05:32,209 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
49 |
+
2021-05-21 21:05:32,209 INFO __main__ Test: accuracy = 0.7941176470588235, f1 = 0.8571428571428571
|
50 |
+
2021-05-21 21:05:32,210 INFO __main__ Start prediction for private dataset(s)
|
51 |
+
2021-05-21 21:05:32,211 INFO __main__ mrpc/test: 1725 samples
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|