Dmitry Chaplinsky commited on
Commit
1e0215d
1 Parent(s): 06a5b46

Adding everything

Browse files
Files changed (6) hide show
  1. README.md +58 -0
  2. best-lm.pt +3 -0
  3. flair_dictionary.pkl +3 -0
  4. loss.txt +599 -0
  5. pipeline.py +22 -0
  6. requirements.txt +1 -0
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
 
 
 
 
 
 
2
  license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - uk
4
+ tags:
5
+ - text2text-generation
6
+ - flair
7
+ library_name: generic
8
  license: mit
9
+ metrics:
10
+ - perplexity
11
+ datasets:
12
+ - ubertext2.0
13
+ widget:
14
+ - text: "Росія зазнає поразки"
15
+ - text: "Достеменно відомо, що Україна перемагає"
16
  ---
17
+
18
+ # Ukrainian flair embeddings (forward, large)
19
+
20
+ Trained for 10 epochs on the texts from ubertext2.0 and corpus of Ukrainian scraped texts from Stefan Schweter (54GB in total).
21
+ The characters dictionary used for training is in `flair_dictionary.pkl` file
22
+
23
+ The model params are:
24
+ ```python
25
+ is_forward_lm=True,
26
+ hidden_size=2048,
27
+ sequence_length=250,
28
+ mini_batch_size=1024,
29
+ max_epochs=30
30
+ ```
31
+
32
+ For more information on flair embeddings see [the article](https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) or the paper below:
33
+
34
+
35
+ ```bibtex
36
+ @inproceedings{akbik2018coling,
37
+ title={Contextual String Embeddings for Sequence Labeling},
38
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
39
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
40
+ pages = {1638--1649},
41
+ year = {2018}
42
+ }
43
+ ```
44
+
45
+ For more information on UberText 2.0 please see:
46
+ ```bibtex
47
+ @inproceedings{chaplynskyi-2023-introducing,
48
+ title = "Introducing {U}ber{T}ext 2.0: A Corpus of {M}odern {U}krainian at Scale",
49
+ author = "Chaplynskyi, Dmytro",
50
+ booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
51
+ month = may,
52
+ year = "2023",
53
+ address = "Dubrovnik, Croatia",
54
+ publisher = "Association for Computational Linguistics",
55
+ url = "https://aclanthology.org/2023.unlp-1.1",
56
+ pages = "1--10",
57
+ abstract = "This paper addresses the need for massive corpora for a low-resource language and presents the publicly available UberText 2.0 corpus for the Ukrainian language and discusses the methodology of its construction. While the collection and maintenance of such a corpus is more of a data extraction and data engineering task, the corpus itself provides a solid foundation for natural language processing tasks. It can enable the creation of contemporary language models and word embeddings, resulting in a better performance of numerous downstream tasks for the Ukrainian language. In addition, the paper and software developed can be used as a guidance and model solution for other low-resource languages. The resulting corpus is available for download on the project page. It has 3.274 billion tokens, consists of 8.59 million texts and takes up 32 gigabytes of space.",
58
+ }
59
+ ```
60
+
61
+ Copyright: [Dmytro Chaplynskyi](https://twitter.com/dchaplinsky), [lang-uk](https://lang.org.ua) project, 2023
best-lm.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d05b5d0f1b68ff0bd7a2ad1a852d25d1034de52fd823e4b9304ce5fc1c615ed
3
+ size 78734687
flair_dictionary.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2125c32d2db5fb79676a8a6f087b19e9c3b788cb19b87073423e31e176d1fe24
3
+ size 11900
loss.txt ADDED
@@ -0,0 +1,599 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ | end of split 1 / 62 | epoch 1 | time: 1603.89s | valid loss 1.4399 | valid ppl 4.2204 | learning rate 20.0000
2
+ | end of split 2 / 62 | epoch 1 | time: 1607.81s | valid loss 1.2745 | valid ppl 3.5770 | learning rate 20.0000
3
+ | end of split 3 / 62 | epoch 1 | time: 1606.22s | valid loss 1.2037 | valid ppl 3.3323 | learning rate 20.0000
4
+ | end of split 4 / 62 | epoch 1 | time: 1606.92s | valid loss 1.1638 | valid ppl 3.2020 | learning rate 20.0000
5
+ | end of split 5 / 62 | epoch 1 | time: 1607.10s | valid loss 1.1394 | valid ppl 3.1250 | learning rate 20.0000
6
+ | end of split 6 / 62 | epoch 1 | time: 1607.63s | valid loss 1.1180 | valid ppl 3.0588 | learning rate 20.0000
7
+ | end of split 7 / 62 | epoch 1 | time: 1608.12s | valid loss 1.1052 | valid ppl 3.0200 | learning rate 20.0000
8
+ | end of split 8 / 62 | epoch 1 | time: 1608.18s | valid loss 1.0969 | valid ppl 2.9948 | learning rate 20.0000
9
+ | end of split 9 / 62 | epoch 1 | time: 1592.98s | valid loss 1.0812 | valid ppl 2.9482 | learning rate 20.0000
10
+ | end of split 10 / 62 | epoch 1 | time: 1597.67s | valid loss 1.0791 | valid ppl 2.9420 | learning rate 20.0000
11
+ | end of split 11 / 62 | epoch 1 | time: 1598.41s | valid loss 1.0690 | valid ppl 2.9124 | learning rate 20.0000
12
+ | end of split 12 / 62 | epoch 1 | time: 1594.52s | valid loss 1.0625 | valid ppl 2.8937 | learning rate 20.0000
13
+ | end of split 13 / 62 | epoch 1 | time: 1595.52s | valid loss 1.0584 | valid ppl 2.8816 | learning rate 20.0000
14
+ | end of split 14 / 62 | epoch 1 | time: 1593.63s | valid loss 1.0520 | valid ppl 2.8634 | learning rate 20.0000
15
+ | end of split 15 / 62 | epoch 1 | time: 1593.45s | valid loss 1.1233 | valid ppl 3.0750 | learning rate 20.0000
16
+ | end of split 16 / 62 | epoch 1 | time: 1594.20s | valid loss 1.0477 | valid ppl 2.8511 | learning rate 20.0000
17
+ | end of split 17 / 62 | epoch 1 | time: 1594.12s | valid loss 1.0393 | valid ppl 2.8274 | learning rate 20.0000
18
+ | end of split 18 / 62 | epoch 1 | time: 1592.60s | valid loss 1.0382 | valid ppl 2.8242 | learning rate 20.0000
19
+ | end of split 19 / 62 | epoch 1 | time: 1591.84s | valid loss 1.0321 | valid ppl 2.8071 | learning rate 20.0000
20
+ | end of split 20 / 62 | epoch 1 | time: 1591.25s | valid loss 1.0335 | valid ppl 2.8109 | learning rate 20.0000
21
+ | end of split 21 / 62 | epoch 1 | time: 1593.49s | valid loss 1.0276 | valid ppl 2.7944 | learning rate 20.0000
22
+ | end of split 22 / 62 | epoch 1 | time: 1590.55s | valid loss 1.0265 | valid ppl 2.7913 | learning rate 20.0000
23
+ | end of split 23 / 62 | epoch 1 | time: 1591.47s | valid loss 1.0218 | valid ppl 2.7781 | learning rate 20.0000
24
+ | end of split 24 / 62 | epoch 1 | time: 1589.39s | valid loss 1.0218 | valid ppl 2.7781 | learning rate 20.0000
25
+ | end of split 25 / 62 | epoch 1 | time: 1591.76s | valid loss 1.0182 | valid ppl 2.7682 | learning rate 20.0000
26
+ | end of split 26 / 62 | epoch 1 | time: 1586.71s | valid loss 1.0198 | valid ppl 2.7726 | learning rate 20.0000
27
+ | end of split 27 / 62 | epoch 1 | time: 1584.62s | valid loss 1.0144 | valid ppl 2.7578 | learning rate 20.0000
28
+ | end of split 28 / 62 | epoch 1 | time: 1586.04s | valid loss 1.0124 | valid ppl 2.7521 | learning rate 20.0000
29
+ | end of split 29 / 62 | epoch 1 | time: 1583.84s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 20.0000
30
+ | end of split 30 / 62 | epoch 1 | time: 1582.16s | valid loss 1.0126 | valid ppl 2.7527 | learning rate 20.0000
31
+ | end of split 31 / 62 | epoch 1 | time: 1582.81s | valid loss 1.0114 | valid ppl 2.7495 | learning rate 20.0000
32
+ | end of split 32 / 62 | epoch 1 | time: 1584.10s | valid loss 1.0078 | valid ppl 2.7396 | learning rate 20.0000
33
+ | end of split 33 / 62 | epoch 1 | time: 1583.96s | valid loss 1.0067 | valid ppl 2.7367 | learning rate 20.0000
34
+ | end of split 34 / 62 | epoch 1 | time: 1584.53s | valid loss 1.0311 | valid ppl 2.8043 | learning rate 20.0000
35
+ | end of split 35 / 62 | epoch 1 | time: 1585.34s | valid loss 1.0022 | valid ppl 2.7243 | learning rate 20.0000
36
+ | end of split 36 / 62 | epoch 1 | time: 1585.67s | valid loss 1.0017 | valid ppl 2.7229 | learning rate 20.0000
37
+ | end of split 37 / 62 | epoch 1 | time: 1583.84s | valid loss 1.0020 | valid ppl 2.7236 | learning rate 20.0000
38
+ | end of split 38 / 62 | epoch 1 | time: 1584.28s | valid loss 0.9989 | valid ppl 2.7152 | learning rate 20.0000
39
+ | end of split 39 / 62 | epoch 1 | time: 1585.90s | valid loss 1.0254 | valid ppl 2.7882 | learning rate 20.0000
40
+ | end of split 40 / 62 | epoch 1 | time: 1588.16s | valid loss 0.9973 | valid ppl 2.7110 | learning rate 20.0000
41
+ | end of split 41 / 62 | epoch 1 | time: 1586.15s | valid loss 0.9961 | valid ppl 2.7076 | learning rate 20.0000
42
+ | end of split 42 / 62 | epoch 1 | time: 1588.69s | valid loss 0.9963 | valid ppl 2.7083 | learning rate 20.0000
43
+ | end of split 43 / 62 | epoch 1 | time: 1588.30s | valid loss 0.9934 | valid ppl 2.7005 | learning rate 20.0000
44
+ | end of split 44 / 62 | epoch 1 | time: 1587.86s | valid loss 0.9962 | valid ppl 2.7080 | learning rate 20.0000
45
+ | end of split 45 / 62 | epoch 1 | time: 1588.43s | valid loss 0.9921 | valid ppl 2.6970 | learning rate 20.0000
46
+ | end of split 46 / 62 | epoch 1 | time: 1591.45s | valid loss 0.9913 | valid ppl 2.6949 | learning rate 20.0000
47
+ | end of split 47 / 62 | epoch 1 | time: 1590.01s | valid loss 1.0074 | valid ppl 2.7386 | learning rate 20.0000
48
+ | end of split 48 / 62 | epoch 1 | time: 1589.84s | valid loss 0.9891 | valid ppl 2.6889 | learning rate 20.0000
49
+ | end of split 49 / 62 | epoch 1 | time: 1591.41s | valid loss 0.9893 | valid ppl 2.6893 | learning rate 20.0000
50
+ | end of split 50 / 62 | epoch 1 | time: 1592.88s | valid loss 0.9881 | valid ppl 2.6861 | learning rate 20.0000
51
+ | end of split 51 / 62 | epoch 1 | time: 1593.67s | valid loss 0.9872 | valid ppl 2.6836 | learning rate 20.0000
52
+ | end of split 52 / 62 | epoch 1 | time: 1593.93s | valid loss 0.9938 | valid ppl 2.7015 | learning rate 20.0000
53
+ | end of split 53 / 62 | epoch 1 | time: 1593.15s | valid loss 0.9875 | valid ppl 2.6845 | learning rate 20.0000
54
+ | end of split 54 / 62 | epoch 1 | time: 1593.89s | valid loss 0.9844 | valid ppl 2.6763 | learning rate 20.0000
55
+ | end of split 55 / 62 | epoch 1 | time: 1594.52s | valid loss 0.9852 | valid ppl 2.6782 | learning rate 20.0000
56
+ | end of split 56 / 62 | epoch 1 | time: 1593.26s | valid loss 0.9848 | valid ppl 2.6772 | learning rate 20.0000
57
+ | end of split 57 / 62 | epoch 1 | time: 1594.39s | valid loss 0.9827 | valid ppl 2.6717 | learning rate 20.0000
58
+ | end of split 58 / 62 | epoch 1 | time: 1593.89s | valid loss 0.9834 | valid ppl 2.6736 | learning rate 20.0000
59
+ | end of split 59 / 62 | epoch 1 | time: 1594.99s | valid loss 0.9814 | valid ppl 2.6682 | learning rate 20.0000
60
+ | end of split 60 / 62 | epoch 1 | time: 1595.07s | valid loss 0.9885 | valid ppl 2.6871 | learning rate 20.0000
61
+ | end of split 61 / 62 | epoch 1 | time: 1593.04s | valid loss 0.9834 | valid ppl 2.6736 | learning rate 20.0000
62
+ | end of split 62 / 62 | epoch 1 | time: 850.81s | valid loss 0.9894 | valid ppl 2.6895 | learning rate 20.0000
63
+ | end of split 1 / 62 | epoch 2 | time: 1589.43s | valid loss 0.9930 | valid ppl 2.6992 | learning rate 20.0000
64
+ | end of split 2 / 62 | epoch 2 | time: 1592.05s | valid loss 0.9823 | valid ppl 2.6706 | learning rate 20.0000
65
+ | end of split 3 / 62 | epoch 2 | time: 1591.91s | valid loss 0.9795 | valid ppl 2.6631 | learning rate 20.0000
66
+ | end of split 4 / 62 | epoch 2 | time: 1589.81s | valid loss 0.9798 | valid ppl 2.6638 | learning rate 20.0000
67
+ | end of split 5 / 62 | epoch 2 | time: 1592.72s | valid loss 0.9863 | valid ppl 2.6812 | learning rate 20.0000
68
+ | end of split 6 / 62 | epoch 2 | time: 1591.02s | valid loss 0.9793 | valid ppl 2.6627 | learning rate 20.0000
69
+ | end of split 7 / 62 | epoch 2 | time: 1591.96s | valid loss 0.9778 | valid ppl 2.6587 | learning rate 20.0000
70
+ | end of split 8 / 62 | epoch 2 | time: 1589.75s | valid loss 0.9770 | valid ppl 2.6565 | learning rate 20.0000
71
+ | end of split 9 / 62 | epoch 2 | time: 1589.90s | valid loss 0.9770 | valid ppl 2.6565 | learning rate 20.0000
72
+ | end of split 10 / 62 | epoch 2 | time: 1586.76s | valid loss 0.9759 | valid ppl 2.6535 | learning rate 20.0000
73
+ | end of split 11 / 62 | epoch 2 | time: 1583.54s | valid loss 0.9783 | valid ppl 2.6600 | learning rate 20.0000
74
+ | end of split 12 / 62 | epoch 2 | time: 1585.70s | valid loss 1.0014 | valid ppl 2.7221 | learning rate 20.0000
75
+ | end of split 13 / 62 | epoch 2 | time: 1585.88s | valid loss 0.9768 | valid ppl 2.6559 | learning rate 20.0000
76
+ | end of split 14 / 62 | epoch 2 | time: 1587.69s | valid loss 0.9754 | valid ppl 2.6523 | learning rate 20.0000
77
+ | end of split 15 / 62 | epoch 2 | time: 1586.05s | valid loss 0.9736 | valid ppl 2.6475 | learning rate 20.0000
78
+ | end of split 16 / 62 | epoch 2 | time: 1589.38s | valid loss 0.9740 | valid ppl 2.6486 | learning rate 20.0000
79
+ | end of split 17 / 62 | epoch 2 | time: 1591.27s | valid loss 0.9756 | valid ppl 2.6527 | learning rate 20.0000
80
+ | end of split 18 / 62 | epoch 2 | time: 1590.28s | valid loss 0.9728 | valid ppl 2.6454 | learning rate 20.0000
81
+ | end of split 19 / 62 | epoch 2 | time: 1588.81s | valid loss 0.9727 | valid ppl 2.6452 | learning rate 20.0000
82
+ | end of split 20 / 62 | epoch 2 | time: 1590.45s | valid loss 0.9723 | valid ppl 2.6440 | learning rate 20.0000
83
+ | end of split 21 / 62 | epoch 2 | time: 1587.61s | valid loss 0.9716 | valid ppl 2.6422 | learning rate 20.0000
84
+ | end of split 22 / 62 | epoch 2 | time: 1587.52s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
85
+ | end of split 23 / 62 | epoch 2 | time: 1587.01s | valid loss 0.9709 | valid ppl 2.6402 | learning rate 20.0000
86
+ | end of split 24 / 62 | epoch 2 | time: 1587.21s | valid loss 0.9701 | valid ppl 2.6383 | learning rate 20.0000
87
+ | end of split 25 / 62 | epoch 2 | time: 1585.58s | valid loss 0.9713 | valid ppl 2.6413 | learning rate 20.0000
88
+ | end of split 26 / 62 | epoch 2 | time: 1582.23s | valid loss 0.9920 | valid ppl 2.6967 | learning rate 20.0000
89
+ | end of split 27 / 62 | epoch 2 | time: 1584.31s | valid loss 0.9696 | valid ppl 2.6368 | learning rate 20.0000
90
+ | end of split 28 / 62 | epoch 2 | time: 1583.27s | valid loss 0.9690 | valid ppl 2.6353 | learning rate 20.0000
91
+ | end of split 29 / 62 | epoch 2 | time: 1583.73s | valid loss 0.9685 | valid ppl 2.6339 | learning rate 20.0000
92
+ | end of split 30 / 62 | epoch 2 | time: 1582.01s | valid loss 0.9712 | valid ppl 2.6412 | learning rate 20.0000
93
+ | end of split 31 / 62 | epoch 2 | time: 1577.61s | valid loss 0.9698 | valid ppl 2.6374 | learning rate 20.0000
94
+ | end of split 32 / 62 | epoch 2 | time: 1576.99s | valid loss 0.9677 | valid ppl 2.6318 | learning rate 20.0000
95
+ | end of split 33 / 62 | epoch 2 | time: 1576.05s | valid loss 0.9675 | valid ppl 2.6314 | learning rate 20.0000
96
+ | end of split 34 / 62 | epoch 2 | time: 1580.30s | valid loss 0.9668 | valid ppl 2.6296 | learning rate 20.0000
97
+ | end of split 35 / 62 | epoch 2 | time: 1580.63s | valid loss 0.9663 | valid ppl 2.6282 | learning rate 20.0000
98
+ | end of split 36 / 62 | epoch 2 | time: 1581.22s | valid loss 0.9660 | valid ppl 2.6275 | learning rate 20.0000
99
+ | end of split 37 / 62 | epoch 2 | time: 1581.83s | valid loss 0.9668 | valid ppl 2.6295 | learning rate 20.0000
100
+ | end of split 38 / 62 | epoch 2 | time: 1583.12s | valid loss 0.9663 | valid ppl 2.6283 | learning rate 20.0000
101
+ | end of split 39 / 62 | epoch 2 | time: 1584.87s | valid loss 0.9653 | valid ppl 2.6256 | learning rate 20.0000
102
+ | end of split 40 / 62 | epoch 2 | time: 847.08s | valid loss 0.9723 | valid ppl 2.6440 | learning rate 20.0000
103
+ | end of split 41 / 62 | epoch 2 | time: 1592.30s | valid loss 0.9707 | valid ppl 2.6398 | learning rate 20.0000
104
+ | end of split 42 / 62 | epoch 2 | time: 1602.69s | valid loss 0.9655 | valid ppl 2.6262 | learning rate 20.0000
105
+ | end of split 43 / 62 | epoch 2 | time: 1608.11s | valid loss 0.9649 | valid ppl 2.6245 | learning rate 20.0000
106
+ | end of split 44 / 62 | epoch 2 | time: 1610.00s | valid loss 0.9641 | valid ppl 2.6225 | learning rate 20.0000
107
+ | end of split 45 / 62 | epoch 2 | time: 1590.39s | valid loss 1.0062 | valid ppl 2.7352 | learning rate 20.0000
108
+ | end of split 46 / 62 | epoch 2 | time: 1569.29s | valid loss 1.5219 | valid ppl 4.5807 | learning rate 20.0000
109
+ | end of split 47 / 62 | epoch 2 | time: 1573.04s | valid loss 1.2816 | valid ppl 3.6023 | learning rate 20.0000
110
+ | end of split 48 / 62 | epoch 2 | time: 1575.91s | valid loss 1.1161 | valid ppl 3.0529 | learning rate 20.0000
111
+ | end of split 49 / 62 | epoch 2 | time: 1573.44s | valid loss 1.0870 | valid ppl 2.9653 | learning rate 20.0000
112
+ | end of split 50 / 62 | epoch 2 | time: 1575.89s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 20.0000
113
+ | end of split 51 / 62 | epoch 2 | time: 1578.06s | valid loss 1.0085 | valid ppl 2.7415 | learning rate 20.0000
114
+ | end of split 52 / 62 | epoch 2 | time: 1583.24s | valid loss 0.9898 | valid ppl 2.6907 | learning rate 20.0000
115
+ | end of split 53 / 62 | epoch 2 | time: 1583.39s | valid loss 0.9789 | valid ppl 2.6617 | learning rate 20.0000
116
+ | end of split 54 / 62 | epoch 2 | time: 1582.99s | valid loss 0.9752 | valid ppl 2.6516 | learning rate 20.0000
117
+ | end of split 55 / 62 | epoch 2 | time: 1584.67s | valid loss 0.9727 | valid ppl 2.6450 | learning rate 20.0000
118
+ | end of split 56 / 62 | epoch 2 | time: 1587.32s | valid loss 0.9680 | valid ppl 2.6327 | learning rate 5.0000
119
+ | end of split 57 / 62 | epoch 2 | time: 1589.56s | valid loss 0.9671 | valid ppl 2.6303 | learning rate 5.0000
120
+ | end of split 58 / 62 | epoch 2 | time: 1590.23s | valid loss 0.9665 | valid ppl 2.6286 | learning rate 5.0000
121
+ | end of split 59 / 62 | epoch 2 | time: 1592.84s | valid loss 0.9658 | valid ppl 2.6270 | learning rate 5.0000
122
+ | end of split 60 / 62 | epoch 2 | time: 1593.67s | valid loss 0.9652 | valid ppl 2.6253 | learning rate 5.0000
123
+ | end of split 61 / 62 | epoch 2 | time: 1593.45s | valid loss 0.9671 | valid ppl 2.6303 | learning rate 5.0000
124
+ | end of split 62 / 62 | epoch 2 | time: 1592.63s | valid loss 0.9642 | valid ppl 2.6228 | learning rate 5.0000
125
+ | end of split 1 / 62 | epoch 3 | time: 1588.48s | valid loss 0.9639 | valid ppl 2.6219 | learning rate 5.0000
126
+ | end of split 2 / 62 | epoch 3 | time: 1595.00s | valid loss 0.9635 | valid ppl 2.6208 | learning rate 5.0000
127
+ | end of split 3 / 62 | epoch 3 | time: 1592.33s | valid loss 0.9631 | valid ppl 2.6197 | learning rate 5.0000
128
+ | end of split 4 / 62 | epoch 3 | time: 1592.28s | valid loss 0.9630 | valid ppl 2.6194 | learning rate 5.0000
129
+ | end of split 5 / 62 | epoch 3 | time: 1592.85s | valid loss 0.9626 | valid ppl 2.6184 | learning rate 5.0000
130
+ | end of split 6 / 62 | epoch 3 | time: 1592.84s | valid loss 0.9622 | valid ppl 2.6173 | learning rate 5.0000
131
+ | end of split 7 / 62 | epoch 3 | time: 1592.00s | valid loss 0.9619 | valid ppl 2.6167 | learning rate 5.0000
132
+ | end of split 8 / 62 | epoch 3 | time: 1593.04s | valid loss 0.9616 | valid ppl 2.6159 | learning rate 5.0000
133
+ | end of split 9 / 62 | epoch 3 | time: 1592.29s | valid loss 0.9615 | valid ppl 2.6155 | learning rate 5.0000
134
+ | end of split 10 / 62 | epoch 3 | time: 1590.81s | valid loss 0.9612 | valid ppl 2.6149 | learning rate 5.0000
135
+ | end of split 11 / 62 | epoch 3 | time: 1591.61s | valid loss 0.9611 | valid ppl 2.6146 | learning rate 5.0000
136
+ | end of split 12 / 62 | epoch 3 | time: 1590.51s | valid loss 0.9609 | valid ppl 2.6141 | learning rate 5.0000
137
+ | end of split 13 / 62 | epoch 3 | time: 1590.78s | valid loss 0.9604 | valid ppl 2.6127 | learning rate 5.0000
138
+ | end of split 14 / 62 | epoch 3 | time: 1589.97s | valid loss 0.9604 | valid ppl 2.6126 | learning rate 5.0000
139
+ | end of split 15 / 62 | epoch 3 | time: 1589.70s | valid loss 0.9600 | valid ppl 2.6117 | learning rate 5.0000
140
+ | end of split 16 / 62 | epoch 3 | time: 1589.05s | valid loss 0.9600 | valid ppl 2.6118 | learning rate 5.0000
141
+ | end of split 17 / 62 | epoch 3 | time: 1589.99s | valid loss 0.9596 | valid ppl 2.6107 | learning rate 5.0000
142
+ | end of split 18 / 62 | epoch 3 | time: 1590.63s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 5.0000
143
+ | end of split 19 / 62 | epoch 3 | time: 1588.73s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 5.0000
144
+ | end of split 20 / 62 | epoch 3 | time: 1589.71s | valid loss 0.9589 | valid ppl 2.6088 | learning rate 5.0000
145
+ | end of split 21 / 62 | epoch 3 | time: 1589.46s | valid loss 0.9588 | valid ppl 2.6086 | learning rate 5.0000
146
+ | end of split 22 / 62 | epoch 3 | time: 1589.12s | valid loss 0.9586 | valid ppl 2.6080 | learning rate 5.0000
147
+ | end of split 23 / 62 | epoch 3 | time: 1591.71s | valid loss 0.9589 | valid ppl 2.6088 | learning rate 5.0000
148
+ | end of split 24 / 62 | epoch 3 | time: 1589.39s | valid loss 0.9582 | valid ppl 2.6070 | learning rate 5.0000
149
+ | end of split 25 / 62 | epoch 3 | time: 1590.33s | valid loss 0.9582 | valid ppl 2.6070 | learning rate 5.0000
150
+ | end of split 26 / 62 | epoch 3 | time: 1589.33s | valid loss 0.9580 | valid ppl 2.6065 | learning rate 5.0000
151
+ | end of split 27 / 62 | epoch 3 | time: 1589.70s | valid loss 0.9580 | valid ppl 2.6066 | learning rate 5.0000
152
+ | end of split 28 / 62 | epoch 3 | time: 1589.72s | valid loss 0.9578 | valid ppl 2.6060 | learning rate 5.0000
153
+ | end of split 29 / 62 | epoch 3 | time: 849.01s | valid loss 0.9583 | valid ppl 2.6072 | learning rate 5.0000
154
+ | end of split 30 / 62 | epoch 3 | time: 1592.01s | valid loss 0.9576 | valid ppl 2.6055 | learning rate 5.0000
155
+ | end of split 31 / 62 | epoch 3 | time: 1593.91s | valid loss 0.9574 | valid ppl 2.6048 | learning rate 5.0000
156
+ | end of split 32 / 62 | epoch 3 | time: 1593.53s | valid loss 0.9573 | valid ppl 2.6047 | learning rate 5.0000
157
+ | end of split 33 / 62 | epoch 3 | time: 1593.28s | valid loss 0.9573 | valid ppl 2.6047 | learning rate 5.0000
158
+ | end of split 34 / 62 | epoch 3 | time: 1592.56s | valid loss 0.9571 | valid ppl 2.6040 | learning rate 5.0000
159
+ | end of split 35 / 62 | epoch 3 | time: 1594.00s | valid loss 0.9569 | valid ppl 2.6037 | learning rate 5.0000
160
+ | end of split 36 / 62 | epoch 3 | time: 1592.16s | valid loss 0.9580 | valid ppl 2.6064 | learning rate 5.0000
161
+ | end of split 37 / 62 | epoch 3 | time: 1593.97s | valid loss 0.9569 | valid ppl 2.6037 | learning rate 5.0000
162
+ | end of split 38 / 62 | epoch 3 | time: 1595.62s | valid loss 0.9566 | valid ppl 2.6029 | learning rate 5.0000
163
+ | end of split 39 / 62 | epoch 3 | time: 1595.26s | valid loss 0.9565 | valid ppl 2.6025 | learning rate 5.0000
164
+ | end of split 40 / 62 | epoch 3 | time: 1595.91s | valid loss 0.9565 | valid ppl 2.6025 | learning rate 5.0000
165
+ | end of split 41 / 62 | epoch 3 | time: 1597.34s | valid loss 0.9562 | valid ppl 2.6019 | learning rate 5.0000
166
+ | end of split 42 / 62 | epoch 3 | time: 1600.88s | valid loss 0.9561 | valid ppl 2.6015 | learning rate 5.0000
167
+ | end of split 43 / 62 | epoch 3 | time: 1601.74s | valid loss 0.9559 | valid ppl 2.6010 | learning rate 5.0000
168
+ | end of split 44 / 62 | epoch 3 | time: 1603.40s | valid loss 0.9562 | valid ppl 2.6018 | learning rate 5.0000
169
+ | end of split 45 / 62 | epoch 3 | time: 1601.88s | valid loss 0.9557 | valid ppl 2.6004 | learning rate 5.0000
170
+ | end of split 46 / 62 | epoch 3 | time: 1602.03s | valid loss 0.9556 | valid ppl 2.6002 | learning rate 5.0000
171
+ | end of split 47 / 62 | epoch 3 | time: 1601.98s | valid loss 0.9555 | valid ppl 2.5999 | learning rate 5.0000
172
+ | end of split 48 / 62 | epoch 3 | time: 1603.86s | valid loss 0.9555 | valid ppl 2.6001 | learning rate 5.0000
173
+ | end of split 49 / 62 | epoch 3 | time: 1600.52s | valid loss 0.9556 | valid ppl 2.6002 | learning rate 5.0000
174
+ | end of split 50 / 62 | epoch 3 | time: 1597.63s | valid loss 0.9549 | valid ppl 2.5985 | learning rate 5.0000
175
+ | end of split 51 / 62 | epoch 3 | time: 1600.65s | valid loss 0.9550 | valid ppl 2.5987 | learning rate 5.0000
176
+ | end of split 52 / 62 | epoch 3 | time: 1599.09s | valid loss 0.9549 | valid ppl 2.5984 | learning rate 5.0000
177
+ | end of split 53 / 62 | epoch 3 | time: 1599.84s | valid loss 0.9549 | valid ppl 2.5983 | learning rate 5.0000
178
+ | end of split 54 / 62 | epoch 3 | time: 1597.92s | valid loss 0.9547 | valid ppl 2.5980 | learning rate 5.0000
179
+ | end of split 55 / 62 | epoch 3 | time: 1598.06s | valid loss 0.9546 | valid ppl 2.5976 | learning rate 5.0000
180
+ | end of split 56 / 62 | epoch 3 | time: 1597.08s | valid loss 0.9544 | valid ppl 2.5970 | learning rate 5.0000
181
+ | end of split 57 / 62 | epoch 3 | time: 1596.42s | valid loss 0.9544 | valid ppl 2.5971 | learning rate 5.0000
182
+ | end of split 58 / 62 | epoch 3 | time: 1597.40s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 5.0000
183
+ | end of split 59 / 62 | epoch 3 | time: 1596.76s | valid loss 0.9539 | valid ppl 2.5959 | learning rate 5.0000
184
+ | end of split 60 / 62 | epoch 3 | time: 1594.38s | valid loss 0.9540 | valid ppl 2.5962 | learning rate 5.0000
185
+ | end of split 61 / 62 | epoch 3 | time: 1595.01s | valid loss 0.9550 | valid ppl 2.5988 | learning rate 5.0000
186
+ | end of split 62 / 62 | epoch 3 | time: 1596.06s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 5.0000
187
+ | end of split 1 / 62 | epoch 4 | time: 1590.51s | valid loss 0.9539 | valid ppl 2.5959 | learning rate 5.0000
188
+ | end of split 2 / 62 | epoch 4 | time: 1594.92s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 5.0000
189
+ | end of split 3 / 62 | epoch 4 | time: 1594.53s | valid loss 0.9536 | valid ppl 2.5950 | learning rate 5.0000
190
+ | end of split 4 / 62 | epoch 4 | time: 1595.50s | valid loss 0.9534 | valid ppl 2.5946 | learning rate 5.0000
191
+ | end of split 5 / 62 | epoch 4 | time: 1594.79s | valid loss 0.9535 | valid ppl 2.5947 | learning rate 5.0000
192
+ | end of split 6 / 62 | epoch 4 | time: 1595.23s | valid loss 0.9535 | valid ppl 2.5948 | learning rate 5.0000
193
+ | end of split 7 / 62 | epoch 4 | time: 1594.51s | valid loss 0.9535 | valid ppl 2.5948 | learning rate 5.0000
194
+ | end of split 8 / 62 | epoch 4 | time: 1595.67s | valid loss 0.9531 | valid ppl 2.5938 | learning rate 5.0000
195
+ | end of split 9 / 62 | epoch 4 | time: 1594.19s | valid loss 0.9533 | valid ppl 2.5942 | learning rate 5.0000
196
+ | end of split 10 / 62 | epoch 4 | time: 1596.43s | valid loss 0.9530 | valid ppl 2.5935 | learning rate 5.0000
197
+ | end of split 11 / 62 | epoch 4 | time: 1594.75s | valid loss 0.9533 | valid ppl 2.5944 | learning rate 5.0000
198
+ | end of split 12 / 62 | epoch 4 | time: 1593.83s | valid loss 0.9530 | valid ppl 2.5934 | learning rate 5.0000
199
+ | end of split 13 / 62 | epoch 4 | time: 1593.87s | valid loss 0.9530 | valid ppl 2.5934 | learning rate 5.0000
200
+ | end of split 14 / 62 | epoch 4 | time: 1595.57s | valid loss 0.9529 | valid ppl 2.5933 | learning rate 5.0000
201
+ | end of split 15 / 62 | epoch 4 | time: 1597.27s | valid loss 0.9527 | valid ppl 2.5927 | learning rate 5.0000
202
+ | end of split 16 / 62 | epoch 4 | time: 1594.24s | valid loss 0.9526 | valid ppl 2.5924 | learning rate 5.0000
203
+ | end of split 17 / 62 | epoch 4 | time: 1594.23s | valid loss 0.9527 | valid ppl 2.5927 | learning rate 5.0000
204
+ | end of split 18 / 62 | epoch 4 | time: 1595.12s | valid loss 0.9524 | valid ppl 2.5918 | learning rate 5.0000
205
+ | end of split 19 / 62 | epoch 4 | time: 1595.95s | valid loss 0.9524 | valid ppl 2.5920 | learning rate 5.0000
206
+ | end of split 20 / 62 | epoch 4 | time: 1594.70s | valid loss 0.9522 | valid ppl 2.5913 | learning rate 5.0000
207
+ | end of split 21 / 62 | epoch 4 | time: 1594.57s | valid loss 0.9520 | valid ppl 2.5908 | learning rate 5.0000
208
+ | end of split 22 / 62 | epoch 4 | time: 1594.91s | valid loss 0.9520 | valid ppl 2.5908 | learning rate 5.0000
209
+ | end of split 23 / 62 | epoch 4 | time: 1594.17s | valid loss 0.9519 | valid ppl 2.5906 | learning rate 5.0000
210
+ | end of split 24 / 62 | epoch 4 | time: 1593.85s | valid loss 0.9519 | valid ppl 2.5906 | learning rate 5.0000
211
+ | end of split 25 / 62 | epoch 4 | time: 1594.37s | valid loss 0.9519 | valid ppl 2.5907 | learning rate 5.0000
212
+ | end of split 26 / 62 | epoch 4 | time: 1595.05s | valid loss 0.9516 | valid ppl 2.5898 | learning rate 5.0000
213
+ | end of split 27 / 62 | epoch 4 | time: 1596.66s | valid loss 0.9516 | valid ppl 2.5898 | learning rate 5.0000
214
+ | end of split 28 / 62 | epoch 4 | time: 1597.62s | valid loss 0.9522 | valid ppl 2.5915 | learning rate 5.0000
215
+ | end of split 29 / 62 | epoch 4 | time: 1596.01s | valid loss 0.9514 | valid ppl 2.5893 | learning rate 5.0000
216
+ | end of split 30 / 62 | epoch 4 | time: 1596.94s | valid loss 0.9514 | valid ppl 2.5895 | learning rate 5.0000
217
+ | end of split 31 / 62 | epoch 4 | time: 1596.59s | valid loss 0.9515 | valid ppl 2.5895 | learning rate 5.0000
218
+ | end of split 32 / 62 | epoch 4 | time: 1594.91s | valid loss 0.9513 | valid ppl 2.5892 | learning rate 5.0000
219
+ | end of split 33 / 62 | epoch 4 | time: 1596.39s | valid loss 0.9512 | valid ppl 2.5888 | learning rate 5.0000
220
+ | end of split 34 / 62 | epoch 4 | time: 1596.82s | valid loss 0.9512 | valid ppl 2.5888 | learning rate 5.0000
221
+ | end of split 35 / 62 | epoch 4 | time: 1597.66s | valid loss 0.9511 | valid ppl 2.5886 | learning rate 5.0000
222
+ | end of split 36 / 62 | epoch 4 | time: 1598.20s | valid loss 0.9516 | valid ppl 2.5899 | learning rate 5.0000
223
+ | end of split 37 / 62 | epoch 4 | time: 1598.02s | valid loss 0.9510 | valid ppl 2.5883 | learning rate 5.0000
224
+ | end of split 38 / 62 | epoch 4 | time: 1597.10s | valid loss 0.9509 | valid ppl 2.5881 | learning rate 5.0000
225
+ | end of split 39 / 62 | epoch 4 | time: 1599.56s | valid loss 0.9509 | valid ppl 2.5879 | learning rate 5.0000
226
+ | end of split 40 / 62 | epoch 4 | time: 1597.81s | valid loss 0.9510 | valid ppl 2.5882 | learning rate 5.0000
227
+ | end of split 41 / 62 | epoch 4 | time: 1598.85s | valid loss 0.9507 | valid ppl 2.5876 | learning rate 5.0000
228
+ | end of split 42 / 62 | epoch 4 | time: 1597.13s | valid loss 0.9507 | valid ppl 2.5875 | learning rate 5.0000
229
+ | end of split 43 / 62 | epoch 4 | time: 1598.31s | valid loss 0.9508 | valid ppl 2.5877 | learning rate 5.0000
230
+ | end of split 44 / 62 | epoch 4 | time: 1597.29s | valid loss 0.9507 | valid ppl 2.5874 | learning rate 5.0000
231
+ | end of split 45 / 62 | epoch 4 | time: 1595.76s | valid loss 0.9508 | valid ppl 2.5877 | learning rate 5.0000
232
+ | end of split 46 / 62 | epoch 4 | time: 1597.26s | valid loss 0.9506 | valid ppl 2.5872 | learning rate 5.0000
233
+ | end of split 47 / 62 | epoch 4 | time: 1596.63s | valid loss 0.9504 | valid ppl 2.5868 | learning rate 5.0000
234
+ | end of split 48 / 62 | epoch 4 | time: 1597.06s | valid loss 0.9503 | valid ppl 2.5866 | learning rate 5.0000
235
+ | end of split 49 / 62 | epoch 4 | time: 1596.32s | valid loss 0.9501 | valid ppl 2.5860 | learning rate 5.0000
236
+ | end of split 50 / 62 | epoch 4 | time: 852.39s | valid loss 0.9507 | valid ppl 2.5876 | learning rate 5.0000
237
+ | end of split 51 / 62 | epoch 4 | time: 1596.92s | valid loss 0.9500 | valid ppl 2.5857 | learning rate 5.0000
238
+ | end of split 52 / 62 | epoch 4 | time: 1595.75s | valid loss 0.9505 | valid ppl 2.5869 | learning rate 5.0000
239
+ | end of split 53 / 62 | epoch 4 | time: 1593.59s | valid loss 0.9501 | valid ppl 2.5858 | learning rate 5.0000
240
+ | end of split 54 / 62 | epoch 4 | time: 1594.38s | valid loss 0.9509 | valid ppl 2.5881 | learning rate 5.0000
241
+ | end of split 55 / 62 | epoch 4 | time: 1593.89s | valid loss 0.9496 | valid ppl 2.5848 | learning rate 5.0000
242
+ | end of split 56 / 62 | epoch 4 | time: 1593.86s | valid loss 0.9499 | valid ppl 2.5854 | learning rate 5.0000
243
+ | end of split 57 / 62 | epoch 4 | time: 1592.65s | valid loss 0.9496 | valid ppl 2.5846 | learning rate 5.0000
244
+ | end of split 58 / 62 | epoch 4 | time: 1593.43s | valid loss 0.9497 | valid ppl 2.5850 | learning rate 5.0000
245
+ | end of split 59 / 62 | epoch 4 | time: 1590.22s | valid loss 0.9496 | valid ppl 2.5846 | learning rate 5.0000
246
+ | end of split 60 / 62 | epoch 4 | time: 1592.59s | valid loss 0.9494 | valid ppl 2.5840 | learning rate 5.0000
247
+ | end of split 61 / 62 | epoch 4 | time: 1590.49s | valid loss 0.9494 | valid ppl 2.5842 | learning rate 5.0000
248
+ | end of split 62 / 62 | epoch 4 | time: 1592.95s | valid loss 0.9494 | valid ppl 2.5841 | learning rate 5.0000
249
+ | end of split 1 / 62 | epoch 5 | time: 1588.63s | valid loss 0.9495 | valid ppl 2.5845 | learning rate 5.0000
250
+ | end of split 2 / 62 | epoch 5 | time: 1594.59s | valid loss 0.9492 | valid ppl 2.5837 | learning rate 5.0000
251
+ | end of split 3 / 62 | epoch 5 | time: 1595.14s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
252
+ | end of split 4 / 62 | epoch 5 | time: 1593.00s | valid loss 0.9491 | valid ppl 2.5833 | learning rate 5.0000
253
+ | end of split 5 / 62 | epoch 5 | time: 1592.16s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
254
+ | end of split 6 / 62 | epoch 5 | time: 1592.38s | valid loss 0.9491 | valid ppl 2.5833 | learning rate 5.0000
255
+ | end of split 7 / 62 | epoch 5 | time: 1593.78s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
256
+ | end of split 8 / 62 | epoch 5 | time: 1594.50s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
257
+ | end of split 9 / 62 | epoch 5 | time: 1594.20s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
258
+ | end of split 10 / 62 | epoch 5 | time: 1594.41s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
259
+ | end of split 11 / 62 | epoch 5 | time: 1592.91s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
260
+ | end of split 12 / 62 | epoch 5 | time: 1595.00s | valid loss 0.9494 | valid ppl 2.5842 | learning rate 5.0000
261
+ | end of split 13 / 62 | epoch 5 | time: 1592.84s | valid loss 0.9486 | valid ppl 2.5822 | learning rate 5.0000
262
+ | end of split 14 / 62 | epoch 5 | time: 1593.26s | valid loss 0.9485 | valid ppl 2.5819 | learning rate 5.0000
263
+ | end of split 15 / 62 | epoch 5 | time: 1592.76s | valid loss 0.9486 | valid ppl 2.5822 | learning rate 5.0000
264
+ | end of split 16 / 62 | epoch 5 | time: 1595.66s | valid loss 0.9483 | valid ppl 2.5814 | learning rate 5.0000
265
+ | end of split 17 / 62 | epoch 5 | time: 1596.12s | valid loss 0.9484 | valid ppl 2.5816 | learning rate 5.0000
266
+ | end of split 18 / 62 | epoch 5 | time: 1597.15s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
267
+ | end of split 19 / 62 | epoch 5 | time: 1595.50s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
268
+ | end of split 20 / 62 | epoch 5 | time: 1597.42s | valid loss 0.9482 | valid ppl 2.5812 | learning rate 5.0000
269
+ | end of split 21 / 62 | epoch 5 | time: 1596.20s | valid loss 0.9483 | valid ppl 2.5814 | learning rate 5.0000
270
+ | end of split 22 / 62 | epoch 5 | time: 1597.06s | valid loss 0.9479 | valid ppl 2.5804 | learning rate 5.0000
271
+ | end of split 23 / 62 | epoch 5 | time: 1596.92s | valid loss 0.9479 | valid ppl 2.5803 | learning rate 5.0000
272
+ | end of split 24 / 62 | epoch 5 | time: 1593.52s | valid loss 0.9481 | valid ppl 2.5807 | learning rate 5.0000
273
+ | end of split 25 / 62 | epoch 5 | time: 1595.12s | valid loss 0.9480 | valid ppl 2.5805 | learning rate 5.0000
274
+ | end of split 26 / 62 | epoch 5 | time: 1595.25s | valid loss 0.9479 | valid ppl 2.5802 | learning rate 5.0000
275
+ | end of split 27 / 62 | epoch 5 | time: 1644.92s | valid loss 0.9477 | valid ppl 2.5799 | learning rate 5.0000
276
+ | end of split 28 / 62 | epoch 5 | time: 1595.94s | valid loss 0.9478 | valid ppl 2.5801 | learning rate 5.0000
277
+ | end of split 29 / 62 | epoch 5 | time: 1596.39s | valid loss 0.9489 | valid ppl 2.5830 | learning rate 5.0000
278
+ | end of split 30 / 62 | epoch 5 | time: 1596.48s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
279
+ | end of split 31 / 62 | epoch 5 | time: 1594.94s | valid loss 0.9480 | valid ppl 2.5805 | learning rate 5.0000
280
+ | end of split 32 / 62 | epoch 5 | time: 1596.25s | valid loss 0.9477 | valid ppl 2.5799 | learning rate 5.0000
281
+ | end of split 33 / 62 | epoch 5 | time: 1595.95s | valid loss 0.9476 | valid ppl 2.5795 | learning rate 5.0000
282
+ | end of split 34 / 62 | epoch 5 | time: 1594.31s | valid loss 0.9474 | valid ppl 2.5791 | learning rate 5.0000
283
+ | end of split 35 / 62 | epoch 5 | time: 1595.73s | valid loss 0.9475 | valid ppl 2.5792 | learning rate 5.0000
284
+ | end of split 36 / 62 | epoch 5 | time: 1593.93s | valid loss 0.9476 | valid ppl 2.5794 | learning rate 5.0000
285
+ | end of split 37 / 62 | epoch 5 | time: 1594.50s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
286
+ | end of split 38 / 62 | epoch 5 | time: 1592.84s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
287
+ | end of split 39 / 62 | epoch 5 | time: 1591.33s | valid loss 0.9473 | valid ppl 2.5788 | learning rate 5.0000
288
+ | end of split 40 / 62 | epoch 5 | time: 1590.07s | valid loss 0.9471 | valid ppl 2.5783 | learning rate 5.0000
289
+ | end of split 41 / 62 | epoch 5 | time: 1591.27s | valid loss 0.9474 | valid ppl 2.5791 | learning rate 5.0000
290
+ | end of split 42 / 62 | epoch 5 | time: 1590.29s | valid loss 0.9471 | valid ppl 2.5782 | learning rate 5.0000
291
+ | end of split 43 / 62 | epoch 5 | time: 1590.07s | valid loss 0.9470 | valid ppl 2.5780 | learning rate 5.0000
292
+ | end of split 44 / 62 | epoch 5 | time: 1590.49s | valid loss 0.9471 | valid ppl 2.5781 | learning rate 5.0000
293
+ | end of split 45 / 62 | epoch 5 | time: 1589.80s | valid loss 0.9473 | valid ppl 2.5787 | learning rate 5.0000
294
+ | end of split 46 / 62 | epoch 5 | time: 1588.77s | valid loss 0.9470 | valid ppl 2.5779 | learning rate 5.0000
295
+ | end of split 47 / 62 | epoch 5 | time: 1589.22s | valid loss 0.9468 | valid ppl 2.5773 | learning rate 5.0000
296
+ | end of split 48 / 62 | epoch 5 | time: 1590.14s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
297
+ | end of split 49 / 62 | epoch 5 | time: 1587.40s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
298
+ | end of split 50 / 62 | epoch 5 | time: 847.83s | valid loss 0.9472 | valid ppl 2.5786 | learning rate 5.0000
299
+ | end of split 51 / 62 | epoch 5 | time: 1588.35s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
300
+ | end of split 52 / 62 | epoch 5 | time: 1587.80s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
301
+ | end of split 53 / 62 | epoch 5 | time: 1588.01s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
302
+ | end of split 54 / 62 | epoch 5 | time: 1585.93s | valid loss 0.9465 | valid ppl 2.5767 | learning rate 5.0000
303
+ | end of split 55 / 62 | epoch 5 | time: 1584.78s | valid loss 0.9463 | valid ppl 2.5763 | learning rate 5.0000
304
+ | end of split 56 / 62 | epoch 5 | time: 1585.77s | valid loss 0.9481 | valid ppl 2.5808 | learning rate 5.0000
305
+ | end of split 57 / 62 | epoch 5 | time: 1586.16s | valid loss 0.9465 | valid ppl 2.5766 | learning rate 5.0000
306
+ | end of split 58 / 62 | epoch 5 | time: 1586.35s | valid loss 0.9464 | valid ppl 2.5765 | learning rate 5.0000
307
+ | end of split 59 / 62 | epoch 5 | time: 1585.15s | valid loss 0.9463 | valid ppl 2.5762 | learning rate 5.0000
308
+ | end of split 60 / 62 | epoch 5 | time: 1585.41s | valid loss 0.9473 | valid ppl 2.5788 | learning rate 5.0000
309
+ | end of split 61 / 62 | epoch 5 | time: 1586.84s | valid loss 0.9462 | valid ppl 2.5760 | learning rate 5.0000
310
+ | end of split 62 / 62 | epoch 5 | time: 1585.85s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
311
+ | end of split 1 / 62 | epoch 6 | time: 1580.81s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
312
+ | end of split 2 / 62 | epoch 6 | time: 1585.96s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
313
+ | end of split 3 / 62 | epoch 6 | time: 1586.43s | valid loss 0.9461 | valid ppl 2.5757 | learning rate 5.0000
314
+ | end of split 4 / 62 | epoch 6 | time: 1591.11s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
315
+ | end of split 5 / 62 | epoch 6 | time: 1593.60s | valid loss 0.9458 | valid ppl 2.5749 | learning rate 5.0000
316
+ | end of split 6 / 62 | epoch 6 | time: 1594.82s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
317
+ | end of split 7 / 62 | epoch 6 | time: 1599.91s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
318
+ | end of split 8 / 62 | epoch 6 | time: 1601.71s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
319
+ | end of split 9 / 62 | epoch 6 | time: 1597.62s | valid loss 0.9458 | valid ppl 2.5747 | learning rate 5.0000
320
+ | end of split 10 / 62 | epoch 6 | time: 1600.06s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
321
+ | end of split 11 / 62 | epoch 6 | time: 1596.53s | valid loss 0.9455 | valid ppl 2.5740 | learning rate 5.0000
322
+ | end of split 12 / 62 | epoch 6 | time: 1599.04s | valid loss 0.9456 | valid ppl 2.5745 | learning rate 5.0000
323
+ | end of split 13 / 62 | epoch 6 | time: 1593.55s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
324
+ | end of split 14 / 62 | epoch 6 | time: 1596.25s | valid loss 0.9454 | valid ppl 2.5740 | learning rate 5.0000
325
+ | end of split 15 / 62 | epoch 6 | time: 1595.15s | valid loss 0.9454 | valid ppl 2.5740 | learning rate 5.0000
326
+ | end of split 16 / 62 | epoch 6 | time: 1595.84s | valid loss 0.9454 | valid ppl 2.5738 | learning rate 5.0000
327
+ | end of split 17 / 62 | epoch 6 | time: 1597.05s | valid loss 0.9453 | valid ppl 2.5737 | learning rate 5.0000
328
+ | end of split 18 / 62 | epoch 6 | time: 1595.68s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
329
+ | end of split 19 / 62 | epoch 6 | time: 1595.81s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
330
+ | end of split 20 / 62 | epoch 6 | time: 1596.74s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
331
+ | end of split 21 / 62 | epoch 6 | time: 1596.50s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
332
+ | end of split 22 / 62 | epoch 6 | time: 1596.57s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
333
+ | end of split 23 / 62 | epoch 6 | time: 1597.51s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
334
+ | end of split 24 / 62 | epoch 6 | time: 1597.85s | valid loss 0.9453 | valid ppl 2.5735 | learning rate 5.0000
335
+ | end of split 25 / 62 | epoch 6 | time: 1595.58s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
336
+ | end of split 26 / 62 | epoch 6 | time: 1599.43s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
337
+ | end of split 27 / 62 | epoch 6 | time: 1625.16s | valid loss 0.9454 | valid ppl 2.5737 | learning rate 5.0000
338
+ | end of split 28 / 62 | epoch 6 | time: 1677.11s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
339
+ | end of split 29 / 62 | epoch 6 | time: 1664.87s | valid loss 0.9500 | valid ppl 2.5857 | learning rate 5.0000
340
+ | end of split 30 / 62 | epoch 6 | time: 1610.42s | valid loss 0.9491 | valid ppl 2.5834 | learning rate 5.0000
341
+ | end of split 31 / 62 | epoch 6 | time: 1613.54s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
342
+ | end of split 32 / 62 | epoch 6 | time: 1616.62s | valid loss 0.9463 | valid ppl 2.5762 | learning rate 5.0000
343
+ | end of split 33 / 62 | epoch 6 | time: 1619.63s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
344
+ | end of split 34 / 62 | epoch 6 | time: 1617.77s | valid loss 0.9452 | valid ppl 2.5735 | learning rate 5.0000
345
+ | end of split 35 / 62 | epoch 6 | time: 1616.49s | valid loss 0.9447 | valid ppl 2.5720 | learning rate 1.2500
346
+ | end of split 36 / 62 | epoch 6 | time: 1617.61s | valid loss 0.9443 | valid ppl 2.5711 | learning rate 1.2500
347
+ | end of split 37 / 62 | epoch 6 | time: 1619.28s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 1.2500
348
+ | end of split 38 / 62 | epoch 6 | time: 1620.03s | valid loss 0.9439 | valid ppl 2.5700 | learning rate 1.2500
349
+ | end of split 39 / 62 | epoch 6 | time: 1621.32s | valid loss 0.9438 | valid ppl 2.5698 | learning rate 1.2500
350
+ | end of split 40 / 62 | epoch 6 | time: 1625.63s | valid loss 0.9437 | valid ppl 2.5695 | learning rate 1.2500
351
+ | end of split 41 / 62 | epoch 6 | time: 1625.86s | valid loss 0.9437 | valid ppl 2.5696 | learning rate 1.2500
352
+ | end of split 42 / 62 | epoch 6 | time: 1625.70s | valid loss 0.9436 | valid ppl 2.5692 | learning rate 1.2500
353
+ | end of split 43 / 62 | epoch 6 | time: 1629.22s | valid loss 0.9436 | valid ppl 2.5691 | learning rate 1.2500
354
+ | end of split 44 / 62 | epoch 6 | time: 1628.58s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
355
+ | end of split 45 / 62 | epoch 6 | time: 870.27s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
356
+ | end of split 46 / 62 | epoch 6 | time: 1629.99s | valid loss 0.9434 | valid ppl 2.5688 | learning rate 1.2500
357
+ | end of split 47 / 62 | epoch 6 | time: 1629.90s | valid loss 0.9435 | valid ppl 2.5689 | learning rate 1.2500
358
+ | end of split 48 / 62 | epoch 6 | time: 1628.52s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
359
+ | end of split 49 / 62 | epoch 6 | time: 1631.93s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
360
+ | end of split 50 / 62 | epoch 6 | time: 1627.56s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
361
+ | end of split 51 / 62 | epoch 6 | time: 1628.79s | valid loss 0.9433 | valid ppl 2.5683 | learning rate 1.2500
362
+ | end of split 52 / 62 | epoch 6 | time: 1630.13s | valid loss 0.9434 | valid ppl 2.5686 | learning rate 1.2500
363
+ | end of split 53 / 62 | epoch 6 | time: 1630.48s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
364
+ | end of split 54 / 62 | epoch 6 | time: 1629.97s | valid loss 0.9432 | valid ppl 2.5681 | learning rate 1.2500
365
+ | end of split 55 / 62 | epoch 6 | time: 1622.82s | valid loss 0.9432 | valid ppl 2.5682 | learning rate 1.2500
366
+ | end of split 56 / 62 | epoch 6 | time: 1624.52s | valid loss 0.9431 | valid ppl 2.5680 | learning rate 1.2500
367
+ | end of split 57 / 62 | epoch 6 | time: 1626.41s | valid loss 0.9431 | valid ppl 2.5679 | learning rate 1.2500
368
+ | end of split 58 / 62 | epoch 6 | time: 1625.56s | valid loss 0.9434 | valid ppl 2.5686 | learning rate 1.2500
369
+ | end of split 59 / 62 | epoch 6 | time: 1627.15s | valid loss 0.9431 | valid ppl 2.5678 | learning rate 1.2500
370
+ | end of split 60 / 62 | epoch 6 | time: 1627.44s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
371
+ | end of split 61 / 62 | epoch 6 | time: 1627.57s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
372
+ | end of split 62 / 62 | epoch 6 | time: 1625.18s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
373
+ | end of split 1 / 62 | epoch 7 | time: 1620.40s | valid loss 0.9429 | valid ppl 2.5675 | learning rate 1.2500
374
+ | end of split 2 / 62 | epoch 7 | time: 1627.79s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
375
+ | end of split 3 / 62 | epoch 7 | time: 1627.64s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
376
+ | end of split 4 / 62 | epoch 7 | time: 1626.87s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
377
+ | end of split 5 / 62 | epoch 7 | time: 1628.51s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
378
+ | end of split 6 / 62 | epoch 7 | time: 1627.38s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
379
+ | end of split 7 / 62 | epoch 7 | time: 1624.51s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
380
+ | end of split 8 / 62 | epoch 7 | time: 1622.62s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
381
+ | end of split 9 / 62 | epoch 7 | time: 1624.24s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
382
+ | end of split 10 / 62 | epoch 7 | time: 1625.57s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
383
+ | end of split 11 / 62 | epoch 7 | time: 1625.67s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
384
+ | end of split 12 / 62 | epoch 7 | time: 1716.44s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
385
+ | end of split 13 / 62 | epoch 7 | time: 1794.58s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
386
+ | end of split 14 / 62 | epoch 7 | time: 1783.52s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
387
+ | end of split 15 / 62 | epoch 7 | time: 1769.46s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
388
+ | end of split 16 / 62 | epoch 7 | time: 1775.92s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
389
+ | end of split 17 / 62 | epoch 7 | time: 1777.89s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
390
+ | end of split 18 / 62 | epoch 7 | time: 1783.47s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
391
+ | end of split 19 / 62 | epoch 7 | time: 1779.88s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
392
+ | end of split 20 / 62 | epoch 7 | time: 1763.54s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
393
+ | end of split 21 / 62 | epoch 7 | time: 1772.71s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
394
+ | end of split 22 / 62 | epoch 7 | time: 1775.60s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
395
+ | end of split 23 / 62 | epoch 7 | time: 1782.51s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
396
+ | end of split 24 / 62 | epoch 7 | time: 1754.16s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
397
+ | end of split 25 / 62 | epoch 7 | time: 941.64s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
398
+ | end of split 26 / 62 | epoch 7 | time: 1763.95s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
399
+ | end of split 27 / 62 | epoch 7 | time: 1776.44s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
400
+ | end of split 28 / 62 | epoch 7 | time: 1768.74s | valid loss 0.9426 | valid ppl 2.5665 | learning rate 1.2500
401
+ | end of split 29 / 62 | epoch 7 | time: 1800.52s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
402
+ | end of split 30 / 62 | epoch 7 | time: 1815.90s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
403
+ | end of split 31 / 62 | epoch 7 | time: 1745.49s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
404
+ | end of split 32 / 62 | epoch 7 | time: 1613.56s | valid loss 0.9425 | valid ppl 2.5664 | learning rate 1.2500
405
+ | end of split 33 / 62 | epoch 7 | time: 1628.29s | valid loss 0.9425 | valid ppl 2.5665 | learning rate 1.2500
406
+ | end of split 34 / 62 | epoch 7 | time: 1624.90s | valid loss 0.9425 | valid ppl 2.5663 | learning rate 1.2500
407
+ | end of split 35 / 62 | epoch 7 | time: 1626.26s | valid loss 0.9425 | valid ppl 2.5664 | learning rate 1.2500
408
+ | end of split 36 / 62 | epoch 7 | time: 1603.86s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 1.2500
409
+ | end of split 37 / 62 | epoch 7 | time: 1605.85s | valid loss 0.9424 | valid ppl 2.5663 | learning rate 1.2500
410
+ | end of split 38 / 62 | epoch 7 | time: 1603.91s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
411
+ | end of split 39 / 62 | epoch 7 | time: 1605.22s | valid loss 0.9425 | valid ppl 2.5663 | learning rate 1.2500
412
+ | end of split 40 / 62 | epoch 7 | time: 1602.75s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
413
+ | end of split 41 / 62 | epoch 7 | time: 1604.28s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 1.2500
414
+ | end of split 42 / 62 | epoch 7 | time: 1603.89s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
415
+ | end of split 43 / 62 | epoch 7 | time: 1603.60s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
416
+ | end of split 44 / 62 | epoch 7 | time: 1606.62s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
417
+ | end of split 45 / 62 | epoch 7 | time: 1604.77s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 1.2500
418
+ | end of split 46 / 62 | epoch 7 | time: 1603.10s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.3125
419
+ | end of split 47 / 62 | epoch 7 | time: 1601.62s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.3125
420
+ | end of split 48 / 62 | epoch 7 | time: 1604.55s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.3125
421
+ | end of split 49 / 62 | epoch 7 | time: 1604.48s | valid loss 0.9421 | valid ppl 2.5654 | learning rate 0.3125
422
+ | end of split 50 / 62 | epoch 7 | time: 1603.34s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
423
+ | end of split 51 / 62 | epoch 7 | time: 1600.92s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
424
+ | end of split 52 / 62 | epoch 7 | time: 1604.70s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
425
+ | end of split 53 / 62 | epoch 7 | time: 1603.28s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
426
+ | end of split 54 / 62 | epoch 7 | time: 1610.64s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
427
+ | end of split 55 / 62 | epoch 7 | time: 1605.28s | valid loss 0.9421 | valid ppl 2.5652 | learning rate 0.3125
428
+ | end of split 56 / 62 | epoch 7 | time: 1603.78s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
429
+ | end of split 57 / 62 | epoch 7 | time: 1603.91s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
430
+ | end of split 58 / 62 | epoch 7 | time: 1605.53s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
431
+ | end of split 59 / 62 | epoch 7 | time: 1656.75s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
432
+ | end of split 60 / 62 | epoch 7 | time: 1603.18s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
433
+ | end of split 61 / 62 | epoch 7 | time: 1601.58s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
434
+ | end of split 62 / 62 | epoch 7 | time: 1602.32s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
435
+ | end of split 1 / 62 | epoch 8 | time: 1599.87s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
436
+ | end of split 2 / 62 | epoch 8 | time: 1605.15s | valid loss 0.9420 | valid ppl 2.5650 | learning rate 0.3125
437
+ | end of split 3 / 62 | epoch 8 | time: 1604.62s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
438
+ | end of split 4 / 62 | epoch 8 | time: 1604.72s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
439
+ | end of split 5 / 62 | epoch 8 | time: 1637.47s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
440
+ | end of split 6 / 62 | epoch 8 | time: 875.65s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
441
+ | end of split 7 / 62 | epoch 8 | time: 1638.44s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
442
+ | end of split 8 / 62 | epoch 8 | time: 1612.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
443
+ | end of split 9 / 62 | epoch 8 | time: 1621.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
444
+ | end of split 10 / 62 | epoch 8 | time: 1640.27s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
445
+ | end of split 11 / 62 | epoch 8 | time: 1640.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
446
+ | end of split 12 / 62 | epoch 8 | time: 1611.55s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
447
+ | end of split 13 / 62 | epoch 8 | time: 1608.16s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
448
+ | end of split 14 / 62 | epoch 8 | time: 1663.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
449
+ | end of split 15 / 62 | epoch 8 | time: 1668.15s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
450
+ | end of split 16 / 62 | epoch 8 | time: 1652.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
451
+ | end of split 17 / 62 | epoch 8 | time: 1614.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
452
+ | end of split 18 / 62 | epoch 8 | time: 1617.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
453
+ | end of split 19 / 62 | epoch 8 | time: 1628.04s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
454
+ | end of split 20 / 62 | epoch 8 | time: 1624.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
455
+ | end of split 21 / 62 | epoch 8 | time: 1637.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
456
+ | end of split 22 / 62 | epoch 8 | time: 1634.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
457
+ | end of split 23 / 62 | epoch 8 | time: 1620.99s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
458
+ | end of split 24 / 62 | epoch 8 | time: 1616.31s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
459
+ | end of split 25 / 62 | epoch 8 | time: 1611.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
460
+ | end of split 26 / 62 | epoch 8 | time: 1605.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
461
+ | end of split 27 / 62 | epoch 8 | time: 1607.65s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
462
+ | end of split 28 / 62 | epoch 8 | time: 1608.79s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
463
+ | end of split 29 / 62 | epoch 8 | time: 1608.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
464
+ | end of split 30 / 62 | epoch 8 | time: 1612.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
465
+ | end of split 31 / 62 | epoch 8 | time: 1612.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
466
+ | end of split 32 / 62 | epoch 8 | time: 1605.76s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
467
+ | end of split 33 / 62 | epoch 8 | time: 1609.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
468
+ | end of split 34 / 62 | epoch 8 | time: 1611.85s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
469
+ | end of split 35 / 62 | epoch 8 | time: 1620.65s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
470
+ | end of split 36 / 62 | epoch 8 | time: 1619.16s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
471
+ | end of split 37 / 62 | epoch 8 | time: 1604.30s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
472
+ | end of split 38 / 62 | epoch 8 | time: 1605.41s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
473
+ | end of split 39 / 62 | epoch 8 | time: 1639.13s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
474
+ | end of split 40 / 62 | epoch 8 | time: 1614.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
475
+ | end of split 41 / 62 | epoch 8 | time: 1619.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
476
+ | end of split 42 / 62 | epoch 8 | time: 1655.86s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0012
477
+ | end of split 43 / 62 | epoch 8 | time: 1652.46s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
478
+ | end of split 44 / 62 | epoch 8 | time: 1622.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
479
+ | end of split 45 / 62 | epoch 8 | time: 1623.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
480
+ | end of split 46 / 62 | epoch 8 | time: 1621.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
481
+ | end of split 47 / 62 | epoch 8 | time: 1619.93s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
482
+ | end of split 48 / 62 | epoch 8 | time: 1626.00s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
483
+ | end of split 49 / 62 | epoch 8 | time: 1619.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
484
+ | end of split 50 / 62 | epoch 8 | time: 1619.02s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
485
+ | end of split 51 / 62 | epoch 8 | time: 1670.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
486
+ | end of split 52 / 62 | epoch 8 | time: 1671.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
487
+ | end of split 53 / 62 | epoch 8 | time: 1675.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
488
+ | end of split 54 / 62 | epoch 8 | time: 1674.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
489
+ | end of split 55 / 62 | epoch 8 | time: 1662.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
490
+ | end of split 56 / 62 | epoch 8 | time: 1656.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
491
+ | end of split 57 / 62 | epoch 8 | time: 1656.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
492
+ | end of split 58 / 62 | epoch 8 | time: 1656.95s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
493
+ | end of split 59 / 62 | epoch 8 | time: 1650.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
494
+ | end of split 60 / 62 | epoch 8 | time: 1621.00s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
495
+ | end of split 61 / 62 | epoch 8 | time: 1621.93s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
496
+ | end of split 62 / 62 | epoch 8 | time: 1619.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
497
+ | end of split 1 / 62 | epoch 9 | time: 1615.83s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
498
+ | end of split 2 / 62 | epoch 9 | time: 1665.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
499
+ | end of split 3 / 62 | epoch 9 | time: 1619.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
500
+ | end of split 4 / 62 | epoch 9 | time: 1618.02s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
501
+ | end of split 5 / 62 | epoch 9 | time: 1615.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
502
+ | end of split 6 / 62 | epoch 9 | time: 1617.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
503
+ | end of split 7 / 62 | epoch 9 | time: 1613.72s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
504
+ | end of split 8 / 62 | epoch 9 | time: 1617.41s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
505
+ | end of split 9 / 62 | epoch 9 | time: 1609.69s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
506
+ | end of split 10 / 62 | epoch 9 | time: 1608.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
507
+ | end of split 11 / 62 | epoch 9 | time: 1619.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
508
+ | end of split 12 / 62 | epoch 9 | time: 1616.51s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
509
+ | end of split 13 / 62 | epoch 9 | time: 1611.11s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
510
+ | end of split 14 / 62 | epoch 9 | time: 1609.59s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
511
+ | end of split 15 / 62 | epoch 9 | time: 1609.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
512
+ | end of split 16 / 62 | epoch 9 | time: 1609.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
513
+ | end of split 17 / 62 | epoch 9 | time: 1610.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
514
+ | end of split 18 / 62 | epoch 9 | time: 1605.49s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
515
+ | end of split 19 / 62 | epoch 9 | time: 1609.29s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
516
+ | end of split 20 / 62 | epoch 9 | time: 1610.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
517
+ | end of split 21 / 62 | epoch 9 | time: 1610.08s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
518
+ | end of split 22 / 62 | epoch 9 | time: 1609.18s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
519
+ | end of split 23 / 62 | epoch 9 | time: 1608.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
520
+ | end of split 24 / 62 | epoch 9 | time: 1609.79s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
521
+ | end of split 25 / 62 | epoch 9 | time: 1608.82s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
522
+ | end of split 26 / 62 | epoch 9 | time: 1609.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
523
+ | end of split 27 / 62 | epoch 9 | time: 1611.33s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
524
+ | end of split 28 / 62 | epoch 9 | time: 1612.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
525
+ | end of split 29 / 62 | epoch 9 | time: 1611.11s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
526
+ | end of split 30 / 62 | epoch 9 | time: 1612.06s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
527
+ | end of split 31 / 62 | epoch 9 | time: 1609.92s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
528
+ | end of split 32 / 62 | epoch 9 | time: 1606.74s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
529
+ | end of split 33 / 62 | epoch 9 | time: 1609.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
530
+ | end of split 34 / 62 | epoch 9 | time: 1610.44s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
531
+ | end of split 35 / 62 | epoch 9 | time: 1613.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
532
+ | end of split 36 / 62 | epoch 9 | time: 1614.50s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
533
+ | end of split 37 / 62 | epoch 9 | time: 1612.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
534
+ | end of split 38 / 62 | epoch 9 | time: 1614.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
535
+ | end of split 39 / 62 | epoch 9 | time: 1616.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
536
+ | end of split 40 / 62 | epoch 9 | time: 1618.87s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
537
+ | end of split 41 / 62 | epoch 9 | time: 1616.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
538
+ | end of split 42 / 62 | epoch 9 | time: 1590.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
539
+ | end of split 43 / 62 | epoch 9 | time: 1588.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
540
+ | end of split 44 / 62 | epoch 9 | time: 1587.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
541
+ | end of split 45 / 62 | epoch 9 | time: 1588.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
542
+ | end of split 46 / 62 | epoch 9 | time: 1599.34s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
543
+ | end of split 47 / 62 | epoch 9 | time: 1601.18s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
544
+ | end of split 48 / 62 | epoch 9 | time: 1601.25s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
545
+ | end of split 49 / 62 | epoch 9 | time: 1602.68s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
546
+ | end of split 50 / 62 | epoch 9 | time: 1601.60s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
547
+ | end of split 51 / 62 | epoch 9 | time: 855.74s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
548
+ | end of split 52 / 62 | epoch 9 | time: 1601.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
549
+ | end of split 53 / 62 | epoch 9 | time: 1600.52s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
550
+ | end of split 54 / 62 | epoch 9 | time: 1596.97s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
551
+ | end of split 55 / 62 | epoch 9 | time: 1594.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
552
+ | end of split 56 / 62 | epoch 9 | time: 1587.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
553
+ | end of split 57 / 62 | epoch 9 | time: 1603.26s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
554
+ | end of split 58 / 62 | epoch 9 | time: 1616.94s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
555
+ | end of split 59 / 62 | epoch 9 | time: 1616.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
556
+ | end of split 60 / 62 | epoch 9 | time: 1618.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
557
+ | end of split 61 / 62 | epoch 9 | time: 1617.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
558
+ | end of split 62 / 62 | epoch 9 | time: 1618.64s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
559
+ | end of split 1 / 62 | epoch 10 | time: 1611.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
560
+ | end of split 2 / 62 | epoch 10 | time: 1613.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
561
+ | end of split 3 / 62 | epoch 10 | time: 1612.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
562
+ | end of split 4 / 62 | epoch 10 | time: 1616.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
563
+ | end of split 5 / 62 | epoch 10 | time: 1614.61s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
564
+ | end of split 6 / 62 | epoch 10 | time: 1616.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
565
+ | end of split 7 / 62 | epoch 10 | time: 1614.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
566
+ | end of split 8 / 62 | epoch 10 | time: 1616.10s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
567
+ | end of split 9 / 62 | epoch 10 | time: 1617.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
568
+ | end of split 10 / 62 | epoch 10 | time: 1616.50s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
569
+ | end of split 11 / 62 | epoch 10 | time: 1614.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
570
+ | end of split 12 / 62 | epoch 10 | time: 1616.48s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
571
+ | end of split 13 / 62 | epoch 10 | time: 1614.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
572
+ | end of split 14 / 62 | epoch 10 | time: 1616.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
573
+ | end of split 15 / 62 | epoch 10 | time: 1617.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
574
+ | end of split 16 / 62 | epoch 10 | time: 1617.48s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
575
+ | end of split 17 / 62 | epoch 10 | time: 1617.70s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
576
+ | end of split 18 / 62 | epoch 10 | time: 1616.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
577
+ | end of split 19 / 62 | epoch 10 | time: 1615.61s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
578
+ | end of split 20 / 62 | epoch 10 | time: 1616.89s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
579
+ | end of split 21 / 62 | epoch 10 | time: 1617.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
580
+ | end of split 22 / 62 | epoch 10 | time: 1615.66s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
581
+ | end of split 23 / 62 | epoch 10 | time: 1617.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
582
+ | end of split 24 / 62 | epoch 10 | time: 1619.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
583
+ | end of split 25 / 62 | epoch 10 | time: 1621.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
584
+ | end of split 26 / 62 | epoch 10 | time: 1619.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
585
+ | end of split 27 / 62 | epoch 10 | time: 1620.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
586
+ | end of split 28 / 62 | epoch 10 | time: 1622.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
587
+ | end of split 29 / 62 | epoch 10 | time: 1624.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
588
+ | end of split 30 / 62 | epoch 10 | time: 1625.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
589
+ | end of split 31 / 62 | epoch 10 | time: 1621.94s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
590
+ | end of split 32 / 62 | epoch 10 | time: 1628.25s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
591
+ | end of split 33 / 62 | epoch 10 | time: 1629.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
592
+ | end of split 34 / 62 | epoch 10 | time: 1630.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
593
+ | end of split 35 / 62 | epoch 10 | time: 1631.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
594
+ | end of split 36 / 62 | epoch 10 | time: 1631.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
595
+ | end of split 37 / 62 | epoch 10 | time: 1634.08s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
596
+ | end of split 38 / 62 | epoch 10 | time: 1634.51s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
597
+ | end of split 39 / 62 | epoch 10 | time: 1634.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
598
+ | end of split 40 / 62 | epoch 10 | time: 1631.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
599
+ TEST: valid loss 0.9404 | valid ppl 2.5611
pipeline.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Dict
2
+ from flair.models.language_model import LanguageModel
3
+
4
+
5
+ class PreTrainedPipeline:
6
+ def __init__(self, path=""):
7
+ from huggingface_hub import hf_hub_download
8
+
9
+ self.model = LanguageModel.load_language_model(
10
+ hf_hub_download(repo_id="dchaplinsky/flair-uk-forward-large", filename="best-lm.pt")
11
+ )
12
+
13
+ def __call__(self, inputs: str) -> List[Dict]:
14
+ """
15
+ Args:
16
+ inputs (:obj:`str`):
17
+ a string containing some text
18
+ Return:
19
+ A :obj:`str`
20
+ """
21
+ inputs = inputs.strip()
22
+ return [{"generated_text": self.model.generate_text(inputs, temperature=0.5)[0]}]
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ flair