Dmitry Chaplinsky commited on
Commit
9c2cbc6
1 Parent(s): 8a60af4
Files changed (6) hide show
  1. README.md +62 -0
  2. best-lm.pt +3 -0
  3. flair_dictionary.pkl +3 -0
  4. loss.txt +456 -0
  5. pipeline.py +22 -0
  6. requirements.txt +1 -0
README.md CHANGED
@@ -1,3 +1,65 @@
1
  ---
 
 
 
 
 
 
2
  license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - uk
4
+ tags:
5
+ - text2text-generation
6
+ - flair
7
+ library_name: generic
8
  license: mit
9
+ metrics:
10
+ - perplexity
11
+ datasets:
12
+ - ubertext2.0
13
+ widget:
14
+ - text: "Росія зазнає поразки"
15
+ - text: "Достеменно відомо, що Україна перемагає"
16
  ---
17
+
18
+ # Ukrainian flair embeddings (backward, large)
19
+
20
+ Trained for 8 epochs on the texts from ubertext2.0 and corpus of Ukrainian scraped texts from Stefan Schweter (54GB in total).
21
+
22
+ This is the **backward** version of the embeddings. You can find the forward version [here](https://huggingface.co/lang-uk/flair-uk-forward-large/)
23
+
24
+ The characters dictionary used for training is in `flair_dictionary.pkl` file
25
+
26
+ The model params are:
27
+ ```python
28
+ is_forward_lm=True,
29
+ hidden_size=2048,
30
+ sequence_length=250,
31
+ mini_batch_size=1024,
32
+ max_epochs=30
33
+ ```
34
+
35
+ For smaller size flair embeddings of the Ukrainian language please check [uk-backward](https://huggingface.co/lang-uk/flair-uk-backward)
36
+
37
+ For more information on flair embeddings, see [the article](https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) or the paper below:
38
+
39
+ ```bibtex
40
+ @inproceedings{akbik2018coling,
41
+ title={Contextual String Embeddings for Sequence Labeling},
42
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
43
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
44
+ pages = {1638--1649},
45
+ year = {2018}
46
+ }
47
+ ```
48
+
49
+ For more information on UberText 2.0 please see:
50
+ ```bibtex
51
+ @inproceedings{chaplynskyi-2023-introducing,
52
+ title = "Introducing {U}ber{T}ext 2.0: A Corpus of {M}odern {U}krainian at Scale",
53
+ author = "Chaplynskyi, Dmytro",
54
+ booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
55
+ month = may,
56
+ year = "2023",
57
+ address = "Dubrovnik, Croatia",
58
+ publisher = "Association for Computational Linguistics",
59
+ url = "https://aclanthology.org/2023.unlp-1.1",
60
+ pages = "1--10",
61
+ abstract = "This paper addresses the need for massive corpora for a low-resource language and presents the publicly available UberText 2.0 corpus for the Ukrainian language and discusses the methodology of its construction. While the collection and maintenance of such a corpus is more of a data extraction and data engineering task, the corpus itself provides a solid foundation for natural language processing tasks. It can enable the creation of contemporary language models and word embeddings, resulting in a better performance of numerous downstream tasks for the Ukrainian language. In addition, the paper and software developed can be used as a guidance and model solution for other low-resource languages. The resulting corpus is available for download on the project page. It has 3.274 billion tokens, consists of 8.59 million texts and takes up 32 gigabytes of space.",
62
+ }
63
+ ```
64
+
65
+ Copyright: [Dmytro Chaplynskyi](https://twitter.com/dchaplinsky), [lang-uk](https://lang.org.ua) project, 2023
best-lm.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42c6ff804b8c6e381764467a736df4d8f37f72b606ca6f8ed689f57cb1d4c3dc
3
+ size 78734687
flair_dictionary.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2125c32d2db5fb79676a8a6f087b19e9c3b788cb19b87073423e31e176d1fe24
3
+ size 11900
loss.txt ADDED
@@ -0,0 +1,456 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ | end of split 1 / 62 | epoch 1 | time: 1583.48s | valid loss 1.4195 | valid ppl 4.1349 | learning rate 20.0000
2
+ | end of split 2 / 62 | epoch 1 | time: 1586.99s | valid loss 1.2706 | valid ppl 3.5628 | learning rate 20.0000
3
+ | end of split 3 / 62 | epoch 1 | time: 1587.17s | valid loss 1.2056 | valid ppl 3.3386 | learning rate 20.0000
4
+ | end of split 4 / 62 | epoch 1 | time: 1588.13s | valid loss 1.1661 | valid ppl 3.2093 | learning rate 20.0000
5
+ | end of split 5 / 62 | epoch 1 | time: 1588.33s | valid loss 1.1408 | valid ppl 3.1294 | learning rate 20.0000
6
+ | end of split 6 / 62 | epoch 1 | time: 1587.62s | valid loss 1.1212 | valid ppl 3.0685 | learning rate 20.0000
7
+ | end of split 7 / 62 | epoch 1 | time: 1587.56s | valid loss 1.1058 | valid ppl 3.0217 | learning rate 20.0000
8
+ | end of split 8 / 62 | epoch 1 | time: 1588.13s | valid loss 1.0983 | valid ppl 2.9990 | learning rate 20.0000
9
+ | end of split 9 / 62 | epoch 1 | time: 1586.70s | valid loss 1.0876 | valid ppl 2.9671 | learning rate 20.0000
10
+ | end of split 10 / 62 | epoch 1 | time: 1585.61s | valid loss 1.0829 | valid ppl 2.9534 | learning rate 20.0000
11
+ | end of split 11 / 62 | epoch 1 | time: 1585.50s | valid loss 1.0744 | valid ppl 2.9282 | learning rate 20.0000
12
+ | end of split 12 / 62 | epoch 1 | time: 1583.26s | valid loss 1.0666 | valid ppl 2.9055 | learning rate 20.0000
13
+ | end of split 13 / 62 | epoch 1 | time: 1584.36s | valid loss 1.0616 | valid ppl 2.8911 | learning rate 20.0000
14
+ | end of split 14 / 62 | epoch 1 | time: 1585.50s | valid loss 1.0568 | valid ppl 2.8771 | learning rate 20.0000
15
+ | end of split 15 / 62 | epoch 1 | time: 1586.30s | valid loss 1.1435 | valid ppl 3.1378 | learning rate 20.0000
16
+ | end of split 16 / 62 | epoch 1 | time: 1590.72s | valid loss 1.0505 | valid ppl 2.8592 | learning rate 20.0000
17
+ | end of split 17 / 62 | epoch 1 | time: 1617.21s | valid loss 1.0468 | valid ppl 2.8484 | learning rate 20.0000
18
+ | end of split 18 / 62 | epoch 1 | time: 1606.50s | valid loss 1.0429 | valid ppl 2.8374 | learning rate 20.0000
19
+ | end of split 19 / 62 | epoch 1 | time: 1600.44s | valid loss 1.0395 | valid ppl 2.8278 | learning rate 20.0000
20
+ | end of split 20 / 62 | epoch 1 | time: 1593.91s | valid loss 1.0392 | valid ppl 2.8268 | learning rate 20.0000
21
+ | end of split 21 / 62 | epoch 1 | time: 1607.71s | valid loss 1.0325 | valid ppl 2.8081 | learning rate 20.0000
22
+ | end of split 22 / 62 | epoch 1 | time: 1603.04s | valid loss 1.0321 | valid ppl 2.8070 | learning rate 20.0000
23
+ | end of split 23 / 62 | epoch 1 | time: 1602.89s | valid loss 1.0292 | valid ppl 2.7988 | learning rate 20.0000
24
+ | end of split 24 / 62 | epoch 1 | time: 1606.15s | valid loss 1.0284 | valid ppl 2.7965 | learning rate 20.0000
25
+ | end of split 25 / 62 | epoch 1 | time: 1583.05s | valid loss 1.0251 | valid ppl 2.7874 | learning rate 20.0000
26
+ | end of split 26 / 62 | epoch 1 | time: 1580.57s | valid loss 1.0232 | valid ppl 2.7820 | learning rate 20.0000
27
+ | end of split 27 / 62 | epoch 1 | time: 1578.17s | valid loss 1.0218 | valid ppl 2.7783 | learning rate 20.0000
28
+ | end of split 28 / 62 | epoch 1 | time: 1577.71s | valid loss 1.0200 | valid ppl 2.7732 | learning rate 20.0000
29
+ | end of split 29 / 62 | epoch 1 | time: 1577.12s | valid loss 1.0258 | valid ppl 2.7895 | learning rate 20.0000
30
+ | end of split 30 / 62 | epoch 1 | time: 1577.09s | valid loss 1.0195 | valid ppl 2.7719 | learning rate 20.0000
31
+ | end of split 31 / 62 | epoch 1 | time: 1575.70s | valid loss 1.0191 | valid ppl 2.7706 | learning rate 20.0000
32
+ | end of split 32 / 62 | epoch 1 | time: 1576.02s | valid loss 1.0141 | valid ppl 2.7570 | learning rate 20.0000
33
+ | end of split 33 / 62 | epoch 1 | time: 1575.11s | valid loss 1.0111 | valid ppl 2.7486 | learning rate 20.0000
34
+ | end of split 34 / 62 | epoch 1 | time: 1574.68s | valid loss 1.0315 | valid ppl 2.8053 | learning rate 20.0000
35
+ | end of split 35 / 62 | epoch 1 | time: 1575.54s | valid loss 1.0103 | valid ppl 2.7463 | learning rate 20.0000
36
+ | end of split 36 / 62 | epoch 1 | time: 1578.17s | valid loss 1.0089 | valid ppl 2.7425 | learning rate 20.0000
37
+ | end of split 37 / 62 | epoch 1 | time: 1581.60s | valid loss 1.0098 | valid ppl 2.7450 | learning rate 20.0000
38
+ | end of split 38 / 62 | epoch 1 | time: 1590.23s | valid loss 1.0059 | valid ppl 2.7345 | learning rate 20.0000
39
+ | end of split 39 / 62 | epoch 1 | time: 1591.84s | valid loss 1.0313 | valid ppl 2.8048 | learning rate 20.0000
40
+ | end of split 40 / 62 | epoch 1 | time: 1592.79s | valid loss 1.0059 | valid ppl 2.7344 | learning rate 20.0000
41
+ | end of split 41 / 62 | epoch 1 | time: 1591.62s | valid loss 1.0026 | valid ppl 2.7253 | learning rate 20.0000
42
+ | end of split 42 / 62 | epoch 1 | time: 1611.75s | valid loss 1.0035 | valid ppl 2.7277 | learning rate 20.0000
43
+ | end of split 43 / 62 | epoch 1 | time: 1618.56s | valid loss 1.0010 | valid ppl 2.7210 | learning rate 20.0000
44
+ | end of split 44 / 62 | epoch 1 | time: 1623.11s | valid loss 1.0031 | valid ppl 2.7267 | learning rate 20.0000
45
+ | end of split 45 / 62 | epoch 1 | time: 1624.39s | valid loss 0.9990 | valid ppl 2.7156 | learning rate 20.0000
46
+ | end of split 46 / 62 | epoch 1 | time: 1627.72s | valid loss 0.9990 | valid ppl 2.7157 | learning rate 20.0000
47
+ | end of split 47 / 62 | epoch 1 | time: 1627.58s | valid loss 1.0122 | valid ppl 2.7516 | learning rate 20.0000
48
+ | end of split 48 / 62 | epoch 1 | time: 1626.44s | valid loss 0.9964 | valid ppl 2.7084 | learning rate 20.0000
49
+ | end of split 49 / 62 | epoch 1 | time: 1625.87s | valid loss 0.9977 | valid ppl 2.7120 | learning rate 20.0000
50
+ | end of split 50 / 62 | epoch 1 | time: 1626.88s | valid loss 0.9963 | valid ppl 2.7082 | learning rate 20.0000
51
+ | end of split 51 / 62 | epoch 1 | time: 1629.08s | valid loss 0.9958 | valid ppl 2.7069 | learning rate 20.0000
52
+ | end of split 52 / 62 | epoch 1 | time: 1629.12s | valid loss 1.0030 | valid ppl 2.7264 | learning rate 20.0000
53
+ | end of split 53 / 62 | epoch 1 | time: 1628.87s | valid loss 0.9934 | valid ppl 2.7005 | learning rate 20.0000
54
+ | end of split 54 / 62 | epoch 1 | time: 1629.78s | valid loss 0.9930 | valid ppl 2.6994 | learning rate 20.0000
55
+ | end of split 55 / 62 | epoch 1 | time: 1628.40s | valid loss 0.9921 | valid ppl 2.6968 | learning rate 20.0000
56
+ | end of split 56 / 62 | epoch 1 | time: 1626.37s | valid loss 0.9927 | valid ppl 2.6984 | learning rate 20.0000
57
+ | end of split 57 / 62 | epoch 1 | time: 1627.36s | valid loss 0.9918 | valid ppl 2.6961 | learning rate 20.0000
58
+ | end of split 58 / 62 | epoch 1 | time: 1625.21s | valid loss 0.9900 | valid ppl 2.6912 | learning rate 20.0000
59
+ | end of split 59 / 62 | epoch 1 | time: 1626.91s | valid loss 0.9888 | valid ppl 2.6880 | learning rate 20.0000
60
+ | end of split 60 / 62 | epoch 1 | time: 1627.73s | valid loss 0.9964 | valid ppl 2.7086 | learning rate 20.0000
61
+ | end of split 61 / 62 | epoch 1 | time: 1626.02s | valid loss 0.9890 | valid ppl 2.6886 | learning rate 20.0000
62
+ | end of split 62 / 62 | epoch 1 | time: 869.09s | valid loss 0.9974 | valid ppl 2.7112 | learning rate 20.0000
63
+ | end of split 1 / 62 | epoch 2 | time: 1622.25s | valid loss 0.9901 | valid ppl 2.6916 | learning rate 20.0000
64
+ | end of split 2 / 62 | epoch 2 | time: 1625.45s | valid loss 0.9873 | valid ppl 2.6839 | learning rate 20.0000
65
+ | end of split 3 / 62 | epoch 2 | time: 1623.22s | valid loss 0.9864 | valid ppl 2.6816 | learning rate 20.0000
66
+ | end of split 4 / 62 | epoch 2 | time: 1623.07s | valid loss 0.9877 | valid ppl 2.6851 | learning rate 20.0000
67
+ | end of split 5 / 62 | epoch 2 | time: 1620.60s | valid loss 1.0115 | valid ppl 2.7496 | learning rate 20.0000
68
+ | end of split 6 / 62 | epoch 2 | time: 1622.51s | valid loss 0.9890 | valid ppl 2.6887 | learning rate 20.0000
69
+ | end of split 7 / 62 | epoch 2 | time: 1620.37s | valid loss 0.9862 | valid ppl 2.6811 | learning rate 20.0000
70
+ | end of split 8 / 62 | epoch 2 | time: 1620.70s | valid loss 0.9869 | valid ppl 2.6828 | learning rate 20.0000
71
+ | end of split 9 / 62 | epoch 2 | time: 1619.16s | valid loss 0.9861 | valid ppl 2.6808 | learning rate 20.0000
72
+ | end of split 10 / 62 | epoch 2 | time: 1617.83s | valid loss 0.9867 | valid ppl 2.6822 | learning rate 20.0000
73
+ | end of split 11 / 62 | epoch 2 | time: 1618.28s | valid loss 1.0056 | valid ppl 2.7335 | learning rate 20.0000
74
+ | end of split 12 / 62 | epoch 2 | time: 1615.81s | valid loss 0.9829 | valid ppl 2.6723 | learning rate 20.0000
75
+ | end of split 13 / 62 | epoch 2 | time: 1615.59s | valid loss 0.9849 | valid ppl 2.6776 | learning rate 20.0000
76
+ | end of split 14 / 62 | epoch 2 | time: 1616.05s | valid loss 0.9907 | valid ppl 2.6930 | learning rate 20.0000
77
+ | end of split 15 / 62 | epoch 2 | time: 863.11s | valid loss 0.9904 | valid ppl 2.6922 | learning rate 20.0000
78
+ | end of split 16 / 62 | epoch 2 | time: 1614.44s | valid loss 0.9823 | valid ppl 2.6705 | learning rate 20.0000
79
+ | end of split 17 / 62 | epoch 2 | time: 1612.68s | valid loss 0.9824 | valid ppl 2.6708 | learning rate 20.0000
80
+ | end of split 18 / 62 | epoch 2 | time: 1608.56s | valid loss 0.9810 | valid ppl 2.6670 | learning rate 20.0000
81
+ | end of split 19 / 62 | epoch 2 | time: 1585.34s | valid loss 0.9799 | valid ppl 2.6641 | learning rate 20.0000
82
+ | end of split 20 / 62 | epoch 2 | time: 1582.65s | valid loss 0.9801 | valid ppl 2.6647 | learning rate 20.0000
83
+ | end of split 21 / 62 | epoch 2 | time: 1581.78s | valid loss 0.9804 | valid ppl 2.6656 | learning rate 20.0000
84
+ | end of split 22 / 62 | epoch 2 | time: 1583.27s | valid loss 0.9791 | valid ppl 2.6620 | learning rate 20.0000
85
+ | end of split 23 / 62 | epoch 2 | time: 1580.74s | valid loss 0.9780 | valid ppl 2.6590 | learning rate 20.0000
86
+ | end of split 24 / 62 | epoch 2 | time: 1581.13s | valid loss 0.9782 | valid ppl 2.6597 | learning rate 20.0000
87
+ | end of split 25 / 62 | epoch 2 | time: 1580.34s | valid loss 0.9795 | valid ppl 2.6631 | learning rate 20.0000
88
+ | end of split 26 / 62 | epoch 2 | time: 1580.35s | valid loss 0.9782 | valid ppl 2.6597 | learning rate 20.0000
89
+ | end of split 27 / 62 | epoch 2 | time: 1579.55s | valid loss 0.9780 | valid ppl 2.6592 | learning rate 20.0000
90
+ | end of split 28 / 62 | epoch 2 | time: 1583.05s | valid loss 0.9850 | valid ppl 2.6778 | learning rate 20.0000
91
+ | end of split 29 / 62 | epoch 2 | time: 1580.68s | valid loss 0.9822 | valid ppl 2.6702 | learning rate 20.0000
92
+ | end of split 30 / 62 | epoch 2 | time: 1577.58s | valid loss 0.9923 | valid ppl 2.6973 | learning rate 20.0000
93
+ | end of split 31 / 62 | epoch 2 | time: 1581.85s | valid loss 0.9764 | valid ppl 2.6550 | learning rate 20.0000
94
+ | end of split 32 / 62 | epoch 2 | time: 1585.87s | valid loss 0.9760 | valid ppl 2.6537 | learning rate 20.0000
95
+ | end of split 33 / 62 | epoch 2 | time: 1588.93s | valid loss 0.9758 | valid ppl 2.6533 | learning rate 20.0000
96
+ | end of split 34 / 62 | epoch 2 | time: 1590.44s | valid loss 0.9759 | valid ppl 2.6536 | learning rate 20.0000
97
+ | end of split 35 / 62 | epoch 2 | time: 1592.53s | valid loss 0.9758 | valid ppl 2.6532 | learning rate 20.0000
98
+ | end of split 36 / 62 | epoch 2 | time: 1594.13s | valid loss 0.9758 | valid ppl 2.6532 | learning rate 20.0000
99
+ | end of split 37 / 62 | epoch 2 | time: 1592.90s | valid loss 0.9737 | valid ppl 2.6476 | learning rate 20.0000
100
+ | end of split 38 / 62 | epoch 2 | time: 1594.82s | valid loss 0.9736 | valid ppl 2.6474 | learning rate 20.0000
101
+ | end of split 39 / 62 | epoch 2 | time: 1596.77s | valid loss 0.9754 | valid ppl 2.6521 | learning rate 20.0000
102
+ | end of split 40 / 62 | epoch 2 | time: 1599.71s | valid loss 0.9753 | valid ppl 2.6520 | learning rate 20.0000
103
+ | end of split 41 / 62 | epoch 2 | time: 1603.63s | valid loss 0.9745 | valid ppl 2.6498 | learning rate 20.0000
104
+ | end of split 42 / 62 | epoch 2 | time: 1608.89s | valid loss 0.9734 | valid ppl 2.6470 | learning rate 20.0000
105
+ | end of split 43 / 62 | epoch 2 | time: 1609.09s | valid loss 0.9725 | valid ppl 2.6445 | learning rate 20.0000
106
+ | end of split 44 / 62 | epoch 2 | time: 1602.88s | valid loss 0.9728 | valid ppl 2.6454 | learning rate 20.0000
107
+ | end of split 45 / 62 | epoch 2 | time: 1598.34s | valid loss 0.9721 | valid ppl 2.6434 | learning rate 20.0000
108
+ | end of split 46 / 62 | epoch 2 | time: 1600.19s | valid loss 0.9719 | valid ppl 2.6430 | learning rate 20.0000
109
+ | end of split 47 / 62 | epoch 2 | time: 1601.67s | valid loss 0.9719 | valid ppl 2.6431 | learning rate 20.0000
110
+ | end of split 48 / 62 | epoch 2 | time: 1605.64s | valid loss 0.9719 | valid ppl 2.6428 | learning rate 20.0000
111
+ | end of split 49 / 62 | epoch 2 | time: 1604.96s | valid loss 0.9710 | valid ppl 2.6406 | learning rate 20.0000
112
+ | end of split 50 / 62 | epoch 2 | time: 1603.96s | valid loss 0.9715 | valid ppl 2.6420 | learning rate 20.0000
113
+ | end of split 51 / 62 | epoch 2 | time: 1609.00s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
114
+ | end of split 52 / 62 | epoch 2 | time: 1609.47s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
115
+ | end of split 53 / 62 | epoch 2 | time: 1607.14s | valid loss 0.9725 | valid ppl 2.6447 | learning rate 20.0000
116
+ | end of split 54 / 62 | epoch 2 | time: 1606.27s | valid loss 0.9706 | valid ppl 2.6396 | learning rate 20.0000
117
+ | end of split 55 / 62 | epoch 2 | time: 1607.85s | valid loss 0.9706 | valid ppl 2.6395 | learning rate 20.0000
118
+ | end of split 56 / 62 | epoch 2 | time: 1607.99s | valid loss 0.9727 | valid ppl 2.6451 | learning rate 20.0000
119
+ | end of split 57 / 62 | epoch 2 | time: 1609.15s | valid loss 0.9696 | valid ppl 2.6368 | learning rate 20.0000
120
+ | end of split 58 / 62 | epoch 2 | time: 1606.21s | valid loss 0.9691 | valid ppl 2.6355 | learning rate 20.0000
121
+ | end of split 59 / 62 | epoch 2 | time: 1606.97s | valid loss 0.9684 | valid ppl 2.6337 | learning rate 20.0000
122
+ | end of split 60 / 62 | epoch 2 | time: 1605.30s | valid loss 0.9686 | valid ppl 2.6341 | learning rate 20.0000
123
+ | end of split 61 / 62 | epoch 2 | time: 1606.09s | valid loss 0.9678 | valid ppl 2.6322 | learning rate 20.0000
124
+ | end of split 62 / 62 | epoch 2 | time: 1604.24s | valid loss 0.9692 | valid ppl 2.6359 | learning rate 20.0000
125
+ | end of split 1 / 62 | epoch 3 | time: 1595.63s | valid loss 0.9704 | valid ppl 2.6389 | learning rate 20.0000
126
+ | end of split 2 / 62 | epoch 3 | time: 1599.02s | valid loss 0.9697 | valid ppl 2.6373 | learning rate 20.0000
127
+ | end of split 3 / 62 | epoch 3 | time: 1599.83s | valid loss 0.9676 | valid ppl 2.6315 | learning rate 20.0000
128
+ | end of split 4 / 62 | epoch 3 | time: 1601.68s | valid loss 0.9684 | valid ppl 2.6337 | learning rate 20.0000
129
+ | end of split 5 / 62 | epoch 3 | time: 1600.81s | valid loss 0.9697 | valid ppl 2.6372 | learning rate 20.0000
130
+ | end of split 6 / 62 | epoch 3 | time: 1601.85s | valid loss 0.9692 | valid ppl 2.6359 | learning rate 20.0000
131
+ | end of split 7 / 62 | epoch 3 | time: 1599.16s | valid loss 0.9675 | valid ppl 2.6314 | learning rate 20.0000
132
+ | end of split 8 / 62 | epoch 3 | time: 1599.83s | valid loss 0.9686 | valid ppl 2.6342 | learning rate 20.0000
133
+ | end of split 9 / 62 | epoch 3 | time: 1587.43s | valid loss 0.9669 | valid ppl 2.6298 | learning rate 20.0000
134
+ | end of split 10 / 62 | epoch 3 | time: 1588.81s | valid loss 0.9677 | valid ppl 2.6318 | learning rate 20.0000
135
+ | end of split 11 / 62 | epoch 3 | time: 1590.43s | valid loss 0.9673 | valid ppl 2.6307 | learning rate 20.0000
136
+ | end of split 12 / 62 | epoch 3 | time: 1592.90s | valid loss 0.9668 | valid ppl 2.6296 | learning rate 20.0000
137
+ | end of split 13 / 62 | epoch 3 | time: 1594.36s | valid loss 0.9676 | valid ppl 2.6317 | learning rate 20.0000
138
+ | end of split 14 / 62 | epoch 3 | time: 1595.81s | valid loss 0.9652 | valid ppl 2.6254 | learning rate 20.0000
139
+ | end of split 15 / 62 | epoch 3 | time: 1596.70s | valid loss 0.9659 | valid ppl 2.6271 | learning rate 20.0000
140
+ | end of split 16 / 62 | epoch 3 | time: 1591.94s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
141
+ | end of split 17 / 62 | epoch 3 | time: 1584.49s | valid loss 0.9656 | valid ppl 2.6262 | learning rate 20.0000
142
+ | end of split 18 / 62 | epoch 3 | time: 1585.57s | valid loss 0.9649 | valid ppl 2.6245 | learning rate 20.0000
143
+ | end of split 19 / 62 | epoch 3 | time: 1579.95s | valid loss 0.9650 | valid ppl 2.6248 | learning rate 20.0000
144
+ | end of split 20 / 62 | epoch 3 | time: 843.60s | valid loss 0.9738 | valid ppl 2.6480 | learning rate 20.0000
145
+ | end of split 21 / 62 | epoch 3 | time: 1580.19s | valid loss 0.9780 | valid ppl 2.6592 | learning rate 20.0000
146
+ | end of split 22 / 62 | epoch 3 | time: 1582.17s | valid loss 1.0091 | valid ppl 2.7433 | learning rate 20.0000
147
+ | end of split 23 / 62 | epoch 3 | time: 1582.31s | valid loss 0.9639 | valid ppl 2.6220 | learning rate 20.0000
148
+ | end of split 24 / 62 | epoch 3 | time: 1582.57s | valid loss 0.9828 | valid ppl 2.6720 | learning rate 20.0000
149
+ | end of split 25 / 62 | epoch 3 | time: 1582.46s | valid loss 0.9636 | valid ppl 2.6210 | learning rate 20.0000
150
+ | end of split 26 / 62 | epoch 3 | time: 1585.02s | valid loss 0.9653 | valid ppl 2.6255 | learning rate 20.0000
151
+ | end of split 27 / 62 | epoch 3 | time: 1584.48s | valid loss 0.9638 | valid ppl 2.6216 | learning rate 20.0000
152
+ | end of split 28 / 62 | epoch 3 | time: 1585.97s | valid loss 0.9641 | valid ppl 2.6225 | learning rate 20.0000
153
+ | end of split 29 / 62 | epoch 3 | time: 1588.62s | valid loss 0.9630 | valid ppl 2.6195 | learning rate 20.0000
154
+ | end of split 30 / 62 | epoch 3 | time: 1605.99s | valid loss 0.9626 | valid ppl 2.6184 | learning rate 20.0000
155
+ | end of split 31 / 62 | epoch 3 | time: 1627.59s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
156
+ | end of split 32 / 62 | epoch 3 | time: 1600.91s | valid loss 0.9649 | valid ppl 2.6246 | learning rate 20.0000
157
+ | end of split 33 / 62 | epoch 3 | time: 1607.37s | valid loss 0.9635 | valid ppl 2.6208 | learning rate 20.0000
158
+ | end of split 34 / 62 | epoch 3 | time: 1605.43s | valid loss 0.9619 | valid ppl 2.6166 | learning rate 20.0000
159
+ | end of split 35 / 62 | epoch 3 | time: 1606.13s | valid loss 0.9621 | valid ppl 2.6173 | learning rate 20.0000
160
+ | end of split 36 / 62 | epoch 3 | time: 1604.60s | valid loss 0.9622 | valid ppl 2.6175 | learning rate 20.0000
161
+ | end of split 37 / 62 | epoch 3 | time: 1606.96s | valid loss 0.9620 | valid ppl 2.6170 | learning rate 20.0000
162
+ | end of split 38 / 62 | epoch 3 | time: 1604.31s | valid loss 0.9615 | valid ppl 2.6157 | learning rate 20.0000
163
+ | end of split 39 / 62 | epoch 3 | time: 1603.46s | valid loss 0.9618 | valid ppl 2.6165 | learning rate 20.0000
164
+ | end of split 40 / 62 | epoch 3 | time: 1602.53s | valid loss 0.9613 | valid ppl 2.6151 | learning rate 20.0000
165
+ | end of split 41 / 62 | epoch 3 | time: 1602.02s | valid loss 0.9613 | valid ppl 2.6151 | learning rate 20.0000
166
+ | end of split 42 / 62 | epoch 3 | time: 1601.02s | valid loss 0.9618 | valid ppl 2.6165 | learning rate 20.0000
167
+ | end of split 43 / 62 | epoch 3 | time: 1602.13s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
168
+ | end of split 44 / 62 | epoch 3 | time: 1605.32s | valid loss 0.9612 | valid ppl 2.6149 | learning rate 20.0000
169
+ | end of split 45 / 62 | epoch 3 | time: 1607.04s | valid loss 0.9808 | valid ppl 2.6667 | learning rate 20.0000
170
+ | end of split 46 / 62 | epoch 3 | time: 1600.96s | valid loss 0.9597 | valid ppl 2.6108 | learning rate 20.0000
171
+ | end of split 47 / 62 | epoch 3 | time: 1602.97s | valid loss 0.9597 | valid ppl 2.6109 | learning rate 20.0000
172
+ | end of split 48 / 62 | epoch 3 | time: 1600.73s | valid loss 0.9657 | valid ppl 2.6267 | learning rate 20.0000
173
+ | end of split 49 / 62 | epoch 3 | time: 1601.65s | valid loss 0.9614 | valid ppl 2.6154 | learning rate 20.0000
174
+ | end of split 50 / 62 | epoch 3 | time: 1601.78s | valid loss 0.9603 | valid ppl 2.6124 | learning rate 20.0000
175
+ | end of split 51 / 62 | epoch 3 | time: 1601.02s | valid loss 0.9593 | valid ppl 2.6098 | learning rate 20.0000
176
+ | end of split 52 / 62 | epoch 3 | time: 1600.92s | valid loss 0.9607 | valid ppl 2.6136 | learning rate 20.0000
177
+ | end of split 53 / 62 | epoch 3 | time: 1601.95s | valid loss 0.9604 | valid ppl 2.6127 | learning rate 20.0000
178
+ | end of split 54 / 62 | epoch 3 | time: 1600.51s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
179
+ | end of split 55 / 62 | epoch 3 | time: 1599.14s | valid loss 0.9588 | valid ppl 2.6086 | learning rate 20.0000
180
+ | end of split 56 / 62 | epoch 3 | time: 1599.72s | valid loss 0.9602 | valid ppl 2.6123 | learning rate 20.0000
181
+ | end of split 57 / 62 | epoch 3 | time: 1597.65s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
182
+ | end of split 58 / 62 | epoch 3 | time: 1598.97s | valid loss 0.9593 | valid ppl 2.6098 | learning rate 20.0000
183
+ | end of split 59 / 62 | epoch 3 | time: 1601.81s | valid loss 0.9589 | valid ppl 2.6089 | learning rate 20.0000
184
+ | end of split 60 / 62 | epoch 3 | time: 1600.21s | valid loss 0.9599 | valid ppl 2.6115 | learning rate 20.0000
185
+ | end of split 61 / 62 | epoch 3 | time: 1598.25s | valid loss 0.9595 | valid ppl 2.6103 | learning rate 20.0000
186
+ | end of split 62 / 62 | epoch 3 | time: 1600.01s | valid loss 0.9584 | valid ppl 2.6075 | learning rate 20.0000
187
+ | end of split 1 / 62 | epoch 4 | time: 1595.62s | valid loss 0.9586 | valid ppl 2.6081 | learning rate 20.0000
188
+ | end of split 2 / 62 | epoch 4 | time: 1593.94s | valid loss 0.9598 | valid ppl 2.6110 | learning rate 20.0000
189
+ | end of split 3 / 62 | epoch 4 | time: 1595.86s | valid loss 0.9592 | valid ppl 2.6096 | learning rate 20.0000
190
+ | end of split 4 / 62 | epoch 4 | time: 852.38s | valid loss 0.9646 | valid ppl 2.6237 | learning rate 20.0000
191
+ | end of split 5 / 62 | epoch 4 | time: 1596.43s | valid loss 0.9590 | valid ppl 2.6091 | learning rate 20.0000
192
+ | end of split 6 / 62 | epoch 4 | time: 1594.53s | valid loss 0.9584 | valid ppl 2.6075 | learning rate 20.0000
193
+ | end of split 7 / 62 | epoch 4 | time: 1594.97s | valid loss 0.9573 | valid ppl 2.6046 | learning rate 20.0000
194
+ | end of split 8 / 62 | epoch 4 | time: 1594.23s | valid loss 0.9579 | valid ppl 2.6062 | learning rate 20.0000
195
+ | end of split 9 / 62 | epoch 4 | time: 1594.24s | valid loss 0.9580 | valid ppl 2.6065 | learning rate 20.0000
196
+ | end of split 10 / 62 | epoch 4 | time: 1591.80s | valid loss 0.9578 | valid ppl 2.6059 | learning rate 20.0000
197
+ | end of split 11 / 62 | epoch 4 | time: 1580.82s | valid loss 0.9572 | valid ppl 2.6043 | learning rate 20.0000
198
+ | end of split 12 / 62 | epoch 4 | time: 1578.60s | valid loss 0.9580 | valid ppl 2.6064 | learning rate 20.0000
199
+ | end of split 13 / 62 | epoch 4 | time: 1580.22s | valid loss 0.9585 | valid ppl 2.6079 | learning rate 20.0000
200
+ | end of split 14 / 62 | epoch 4 | time: 1578.77s | valid loss 0.9627 | valid ppl 2.6189 | learning rate 20.0000
201
+ | end of split 15 / 62 | epoch 4 | time: 1579.05s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
202
+ | end of split 16 / 62 | epoch 4 | time: 1577.56s | valid loss 0.9568 | valid ppl 2.6034 | learning rate 20.0000
203
+ | end of split 17 / 62 | epoch 4 | time: 1578.26s | valid loss 0.9572 | valid ppl 2.6044 | learning rate 20.0000
204
+ | end of split 18 / 62 | epoch 4 | time: 1579.21s | valid loss 0.9566 | valid ppl 2.6027 | learning rate 20.0000
205
+ | end of split 19 / 62 | epoch 4 | time: 1578.77s | valid loss 0.9567 | valid ppl 2.6030 | learning rate 20.0000
206
+ | end of split 20 / 62 | epoch 4 | time: 1576.14s | valid loss 0.9584 | valid ppl 2.6076 | learning rate 20.0000
207
+ | end of split 21 / 62 | epoch 4 | time: 1576.68s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
208
+ | end of split 22 / 62 | epoch 4 | time: 1576.80s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
209
+ | end of split 23 / 62 | epoch 4 | time: 1576.23s | valid loss 0.9744 | valid ppl 2.6496 | learning rate 20.0000
210
+ | end of split 24 / 62 | epoch 4 | time: 1575.49s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
211
+ | end of split 25 / 62 | epoch 4 | time: 1577.44s | valid loss 0.9555 | valid ppl 2.6000 | learning rate 20.0000
212
+ | end of split 26 / 62 | epoch 4 | time: 1577.10s | valid loss 0.9564 | valid ppl 2.6024 | learning rate 20.0000
213
+ | end of split 27 / 62 | epoch 4 | time: 1576.83s | valid loss 0.9560 | valid ppl 2.6012 | learning rate 20.0000
214
+ | end of split 28 / 62 | epoch 4 | time: 1588.94s | valid loss 0.9567 | valid ppl 2.6031 | learning rate 20.0000
215
+ | end of split 29 / 62 | epoch 4 | time: 1591.83s | valid loss 0.9554 | valid ppl 2.5996 | learning rate 20.0000
216
+ | end of split 30 / 62 | epoch 4 | time: 1603.93s | valid loss 0.9554 | valid ppl 2.5997 | learning rate 20.0000
217
+ | end of split 31 / 62 | epoch 4 | time: 1595.76s | valid loss 0.9549 | valid ppl 2.5985 | learning rate 20.0000
218
+ | end of split 32 / 62 | epoch 4 | time: 1711.81s | valid loss 0.9561 | valid ppl 2.6015 | learning rate 20.0000
219
+ | end of split 33 / 62 | epoch 4 | time: 1577.07s | valid loss 0.9577 | valid ppl 2.6058 | learning rate 20.0000
220
+ | end of split 34 / 62 | epoch 4 | time: 1576.41s | valid loss 0.9546 | valid ppl 2.5978 | learning rate 20.0000
221
+ | end of split 35 / 62 | epoch 4 | time: 1577.72s | valid loss 0.9552 | valid ppl 2.5991 | learning rate 20.0000
222
+ | end of split 36 / 62 | epoch 4 | time: 1577.03s | valid loss 0.9553 | valid ppl 2.5995 | learning rate 20.0000
223
+ | end of split 37 / 62 | epoch 4 | time: 1578.71s | valid loss 0.9544 | valid ppl 2.5972 | learning rate 20.0000
224
+ | end of split 38 / 62 | epoch 4 | time: 1577.03s | valid loss 0.9559 | valid ppl 2.6011 | learning rate 20.0000
225
+ | end of split 39 / 62 | epoch 4 | time: 1630.11s | valid loss 0.9540 | valid ppl 2.5962 | learning rate 20.0000
226
+ | end of split 40 / 62 | epoch 4 | time: 1579.09s | valid loss 0.9558 | valid ppl 2.6007 | learning rate 20.0000
227
+ | end of split 41 / 62 | epoch 4 | time: 1578.58s | valid loss 0.9538 | valid ppl 2.5956 | learning rate 20.0000
228
+ | end of split 42 / 62 | epoch 4 | time: 1579.44s | valid loss 0.9541 | valid ppl 2.5964 | learning rate 20.0000
229
+ | end of split 43 / 62 | epoch 4 | time: 1577.04s | valid loss 0.9544 | valid ppl 2.5971 | learning rate 20.0000
230
+ | end of split 44 / 62 | epoch 4 | time: 1576.88s | valid loss 0.9544 | valid ppl 2.5972 | learning rate 20.0000
231
+ | end of split 45 / 62 | epoch 4 | time: 1578.62s | valid loss 0.9600 | valid ppl 2.6116 | learning rate 20.0000
232
+ | end of split 46 / 62 | epoch 4 | time: 1577.25s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
233
+ | end of split 47 / 62 | epoch 4 | time: 1577.78s | valid loss 0.9554 | valid ppl 2.5996 | learning rate 20.0000
234
+ | end of split 48 / 62 | epoch 4 | time: 1577.99s | valid loss 0.9545 | valid ppl 2.5974 | learning rate 20.0000
235
+ | end of split 49 / 62 | epoch 4 | time: 1575.73s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 20.0000
236
+ | end of split 50 / 62 | epoch 4 | time: 1574.23s | valid loss 0.9535 | valid ppl 2.5947 | learning rate 20.0000
237
+ | end of split 51 / 62 | epoch 4 | time: 1575.99s | valid loss 0.9623 | valid ppl 2.6176 | learning rate 20.0000
238
+ | end of split 52 / 62 | epoch 4 | time: 1575.37s | valid loss 0.9954 | valid ppl 2.7058 | learning rate 20.0000
239
+ | end of split 53 / 62 | epoch 4 | time: 1574.08s | valid loss 0.9561 | valid ppl 2.6014 | learning rate 20.0000
240
+ | end of split 54 / 62 | epoch 4 | time: 1575.32s | valid loss 0.9543 | valid ppl 2.5968 | learning rate 20.0000
241
+ | end of split 55 / 62 | epoch 4 | time: 1575.06s | valid loss 0.9541 | valid ppl 2.5962 | learning rate 20.0000
242
+ | end of split 56 / 62 | epoch 4 | time: 1575.80s | valid loss 0.9713 | valid ppl 2.6413 | learning rate 20.0000
243
+ | end of split 57 / 62 | epoch 4 | time: 1577.19s | valid loss 0.9556 | valid ppl 2.6003 | learning rate 20.0000
244
+ | end of split 58 / 62 | epoch 4 | time: 1576.21s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
245
+ | end of split 59 / 62 | epoch 4 | time: 1577.08s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
246
+ | end of split 60 / 62 | epoch 4 | time: 1574.14s | valid loss 0.9572 | valid ppl 2.6045 | learning rate 20.0000
247
+ | end of split 61 / 62 | epoch 4 | time: 1571.90s | valid loss 0.9549 | valid ppl 2.5984 | learning rate 20.0000
248
+ | end of split 62 / 62 | epoch 4 | time: 1572.26s | valid loss 0.9482 | valid ppl 2.5811 | learning rate 5.0000
249
+ | end of split 1 / 62 | epoch 5 | time: 1570.96s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
250
+ | end of split 2 / 62 | epoch 5 | time: 1573.43s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
251
+ | end of split 3 / 62 | epoch 5 | time: 1573.08s | valid loss 0.9472 | valid ppl 2.5786 | learning rate 5.0000
252
+ | end of split 4 / 62 | epoch 5 | time: 1572.80s | valid loss 0.9474 | valid ppl 2.5789 | learning rate 5.0000
253
+ | end of split 5 / 62 | epoch 5 | time: 1572.50s | valid loss 0.9477 | valid ppl 2.5798 | learning rate 5.0000
254
+ | end of split 6 / 62 | epoch 5 | time: 1574.27s | valid loss 0.9469 | valid ppl 2.5777 | learning rate 5.0000
255
+ | end of split 7 / 62 | epoch 5 | time: 1575.64s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
256
+ | end of split 8 / 62 | epoch 5 | time: 1577.81s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
257
+ | end of split 9 / 62 | epoch 5 | time: 1578.61s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
258
+ | end of split 10 / 62 | epoch 5 | time: 1580.32s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
259
+ | end of split 11 / 62 | epoch 5 | time: 1581.85s | valid loss 0.9467 | valid ppl 2.5771 | learning rate 5.0000
260
+ | end of split 12 / 62 | epoch 5 | time: 1582.22s | valid loss 0.9466 | valid ppl 2.5769 | learning rate 5.0000
261
+ | end of split 13 / 62 | epoch 5 | time: 1581.45s | valid loss 0.9466 | valid ppl 2.5769 | learning rate 5.0000
262
+ | end of split 14 / 62 | epoch 5 | time: 1579.73s | valid loss 0.9466 | valid ppl 2.5770 | learning rate 5.0000
263
+ | end of split 15 / 62 | epoch 5 | time: 1581.60s | valid loss 0.9466 | valid ppl 2.5768 | learning rate 5.0000
264
+ | end of split 16 / 62 | epoch 5 | time: 1577.02s | valid loss 0.9463 | valid ppl 2.5761 | learning rate 5.0000
265
+ | end of split 17 / 62 | epoch 5 | time: 1576.46s | valid loss 0.9465 | valid ppl 2.5768 | learning rate 5.0000
266
+ | end of split 18 / 62 | epoch 5 | time: 1577.82s | valid loss 0.9472 | valid ppl 2.5785 | learning rate 5.0000
267
+ | end of split 19 / 62 | epoch 5 | time: 1579.10s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
268
+ | end of split 20 / 62 | epoch 5 | time: 1579.00s | valid loss 0.9462 | valid ppl 2.5760 | learning rate 5.0000
269
+ | end of split 21 / 62 | epoch 5 | time: 1579.61s | valid loss 0.9461 | valid ppl 2.5757 | learning rate 5.0000
270
+ | end of split 22 / 62 | epoch 5 | time: 1580.98s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
271
+ | end of split 23 / 62 | epoch 5 | time: 1581.08s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
272
+ | end of split 24 / 62 | epoch 5 | time: 1581.18s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
273
+ | end of split 25 / 62 | epoch 5 | time: 1579.63s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
274
+ | end of split 26 / 62 | epoch 5 | time: 1584.07s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
275
+ | end of split 27 / 62 | epoch 5 | time: 1595.88s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
276
+ | end of split 28 / 62 | epoch 5 | time: 1594.85s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
277
+ | end of split 29 / 62 | epoch 5 | time: 1592.49s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
278
+ | end of split 30 / 62 | epoch 5 | time: 1592.88s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
279
+ | end of split 31 / 62 | epoch 5 | time: 1595.11s | valid loss 0.9458 | valid ppl 2.5747 | learning rate 5.0000
280
+ | end of split 32 / 62 | epoch 5 | time: 1596.27s | valid loss 0.9458 | valid ppl 2.5748 | learning rate 5.0000
281
+ | end of split 33 / 62 | epoch 5 | time: 1593.21s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
282
+ | end of split 34 / 62 | epoch 5 | time: 1594.40s | valid loss 0.9457 | valid ppl 2.5746 | learning rate 5.0000
283
+ | end of split 35 / 62 | epoch 5 | time: 1590.87s | valid loss 0.9455 | valid ppl 2.5741 | learning rate 5.0000
284
+ | end of split 36 / 62 | epoch 5 | time: 1593.79s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
285
+ | end of split 37 / 62 | epoch 5 | time: 1591.50s | valid loss 0.9456 | valid ppl 2.5745 | learning rate 5.0000
286
+ | end of split 38 / 62 | epoch 5 | time: 1589.49s | valid loss 0.9457 | valid ppl 2.5745 | learning rate 5.0000
287
+ | end of split 39 / 62 | epoch 5 | time: 1590.75s | valid loss 0.9480 | valid ppl 2.5806 | learning rate 5.0000
288
+ | end of split 40 / 62 | epoch 5 | time: 1590.43s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
289
+ | end of split 41 / 62 | epoch 5 | time: 1590.08s | valid loss 0.9455 | valid ppl 2.5741 | learning rate 5.0000
290
+ | end of split 42 / 62 | epoch 5 | time: 1589.48s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
291
+ | end of split 43 / 62 | epoch 5 | time: 1587.62s | valid loss 0.9457 | valid ppl 2.5745 | learning rate 5.0000
292
+ | end of split 44 / 62 | epoch 5 | time: 1586.79s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
293
+ | end of split 45 / 62 | epoch 5 | time: 1585.86s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
294
+ | end of split 46 / 62 | epoch 5 | time: 1586.95s | valid loss 0.9454 | valid ppl 2.5738 | learning rate 5.0000
295
+ | end of split 47 / 62 | epoch 5 | time: 1587.96s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
296
+ | end of split 48 / 62 | epoch 5 | time: 1587.28s | valid loss 0.9455 | valid ppl 2.5740 | learning rate 5.0000
297
+ | end of split 49 / 62 | epoch 5 | time: 1587.77s | valid loss 0.9451 | valid ppl 2.5732 | learning rate 5.0000
298
+ | end of split 50 / 62 | epoch 5 | time: 1586.98s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
299
+ | end of split 51 / 62 | epoch 5 | time: 1585.51s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
300
+ | end of split 52 / 62 | epoch 5 | time: 1586.57s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
301
+ | end of split 53 / 62 | epoch 5 | time: 1586.75s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
302
+ | end of split 54 / 62 | epoch 5 | time: 846.84s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
303
+ | end of split 55 / 62 | epoch 5 | time: 1583.94s | valid loss 0.9451 | valid ppl 2.5730 | learning rate 5.0000
304
+ | end of split 56 / 62 | epoch 5 | time: 1585.75s | valid loss 0.9451 | valid ppl 2.5732 | learning rate 5.0000
305
+ | end of split 57 / 62 | epoch 5 | time: 1585.81s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
306
+ | end of split 58 / 62 | epoch 5 | time: 1586.18s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
307
+ | end of split 59 / 62 | epoch 5 | time: 1586.85s | valid loss 0.9449 | valid ppl 2.5725 | learning rate 5.0000
308
+ | end of split 60 / 62 | epoch 5 | time: 1591.84s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
309
+ | end of split 61 / 62 | epoch 5 | time: 1592.74s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
310
+ | end of split 62 / 62 | epoch 5 | time: 1595.38s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
311
+ | end of split 1 / 62 | epoch 6 | time: 1594.09s | valid loss 0.9448 | valid ppl 2.5724 | learning rate 5.0000
312
+ | end of split 2 / 62 | epoch 6 | time: 1598.24s | valid loss 0.9448 | valid ppl 2.5723 | learning rate 5.0000
313
+ | end of split 3 / 62 | epoch 6 | time: 1598.85s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
314
+ | end of split 4 / 62 | epoch 6 | time: 1593.37s | valid loss 0.9448 | valid ppl 2.5723 | learning rate 5.0000
315
+ | end of split 5 / 62 | epoch 6 | time: 1586.31s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
316
+ | end of split 6 / 62 | epoch 6 | time: 1586.36s | valid loss 0.9446 | valid ppl 2.5718 | learning rate 5.0000
317
+ | end of split 7 / 62 | epoch 6 | time: 1584.08s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
318
+ | end of split 8 / 62 | epoch 6 | time: 1584.49s | valid loss 0.9445 | valid ppl 2.5716 | learning rate 5.0000
319
+ | end of split 9 / 62 | epoch 6 | time: 1583.63s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
320
+ | end of split 10 / 62 | epoch 6 | time: 1582.25s | valid loss 0.9446 | valid ppl 2.5718 | learning rate 5.0000
321
+ | end of split 11 / 62 | epoch 6 | time: 1583.67s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
322
+ | end of split 12 / 62 | epoch 6 | time: 1592.91s | valid loss 0.9445 | valid ppl 2.5715 | learning rate 5.0000
323
+ | end of split 13 / 62 | epoch 6 | time: 1591.67s | valid loss 0.9445 | valid ppl 2.5716 | learning rate 5.0000
324
+ | end of split 14 / 62 | epoch 6 | time: 1593.32s | valid loss 0.9444 | valid ppl 2.5712 | learning rate 5.0000
325
+ | end of split 15 / 62 | epoch 6 | time: 1595.18s | valid loss 0.9444 | valid ppl 2.5714 | learning rate 5.0000
326
+ | end of split 16 / 62 | epoch 6 | time: 1595.10s | valid loss 0.9447 | valid ppl 2.5719 | learning rate 5.0000
327
+ | end of split 17 / 62 | epoch 6 | time: 1595.70s | valid loss 0.9444 | valid ppl 2.5711 | learning rate 5.0000
328
+ | end of split 18 / 62 | epoch 6 | time: 1593.68s | valid loss 0.9444 | valid ppl 2.5713 | learning rate 5.0000
329
+ | end of split 19 / 62 | epoch 6 | time: 1595.28s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
330
+ | end of split 20 / 62 | epoch 6 | time: 1595.01s | valid loss 0.9475 | valid ppl 2.5793 | learning rate 5.0000
331
+ | end of split 21 / 62 | epoch 6 | time: 1594.95s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
332
+ | end of split 22 / 62 | epoch 6 | time: 1595.46s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
333
+ | end of split 23 / 62 | epoch 6 | time: 1597.41s | valid loss 0.9442 | valid ppl 2.5708 | learning rate 5.0000
334
+ | end of split 24 / 62 | epoch 6 | time: 1597.13s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
335
+ | end of split 25 / 62 | epoch 6 | time: 1595.18s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
336
+ | end of split 26 / 62 | epoch 6 | time: 1594.01s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
337
+ | end of split 27 / 62 | epoch 6 | time: 1594.84s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
338
+ | end of split 28 / 62 | epoch 6 | time: 1592.94s | valid loss 0.9441 | valid ppl 2.5705 | learning rate 5.0000
339
+ | end of split 29 / 62 | epoch 6 | time: 1591.38s | valid loss 0.9443 | valid ppl 2.5711 | learning rate 5.0000
340
+ | end of split 30 / 62 | epoch 6 | time: 1590.34s | valid loss 0.9442 | valid ppl 2.5707 | learning rate 5.0000
341
+ | end of split 31 / 62 | epoch 6 | time: 1592.84s | valid loss 0.9441 | valid ppl 2.5704 | learning rate 5.0000
342
+ | end of split 32 / 62 | epoch 6 | time: 1589.97s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
343
+ | end of split 33 / 62 | epoch 6 | time: 1589.48s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
344
+ | end of split 34 / 62 | epoch 6 | time: 1590.99s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
345
+ | end of split 35 / 62 | epoch 6 | time: 1587.27s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
346
+ | end of split 36 / 62 | epoch 6 | time: 1589.43s | valid loss 0.9433 | valid ppl 2.5683 | learning rate 1.2500
347
+ | end of split 37 / 62 | epoch 6 | time: 1590.89s | valid loss 0.9431 | valid ppl 2.5680 | learning rate 1.2500
348
+ | end of split 38 / 62 | epoch 6 | time: 1591.30s | valid loss 0.9431 | valid ppl 2.5679 | learning rate 1.2500
349
+ | end of split 39 / 62 | epoch 6 | time: 1587.59s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
350
+ | end of split 40 / 62 | epoch 6 | time: 1589.99s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
351
+ | end of split 41 / 62 | epoch 6 | time: 848.87s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
352
+ | end of split 42 / 62 | epoch 6 | time: 1589.92s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
353
+ | end of split 43 / 62 | epoch 6 | time: 1588.08s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
354
+ | end of split 44 / 62 | epoch 6 | time: 1586.96s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
355
+ | end of split 45 / 62 | epoch 6 | time: 1587.55s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
356
+ | end of split 46 / 62 | epoch 6 | time: 1586.69s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
357
+ | end of split 47 / 62 | epoch 6 | time: 1587.20s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
358
+ | end of split 48 / 62 | epoch 6 | time: 1587.64s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
359
+ | end of split 49 / 62 | epoch 6 | time: 1579.53s | valid loss 0.9427 | valid ppl 2.5670 | learning rate 1.2500
360
+ | end of split 50 / 62 | epoch 6 | time: 1577.89s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
361
+ | end of split 51 / 62 | epoch 6 | time: 1574.78s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
362
+ | end of split 52 / 62 | epoch 6 | time: 1575.34s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
363
+ | end of split 53 / 62 | epoch 6 | time: 1574.50s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
364
+ | end of split 54 / 62 | epoch 6 | time: 1578.06s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
365
+ | end of split 55 / 62 | epoch 6 | time: 1577.22s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
366
+ | end of split 56 / 62 | epoch 6 | time: 1577.40s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
367
+ | end of split 57 / 62 | epoch 6 | time: 1579.42s | valid loss 0.9426 | valid ppl 2.5668 | learning rate 1.2500
368
+ | end of split 58 / 62 | epoch 6 | time: 1575.45s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
369
+ | end of split 59 / 62 | epoch 6 | time: 1577.22s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
370
+ | end of split 60 / 62 | epoch 6 | time: 1582.29s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
371
+ | end of split 61 / 62 | epoch 6 | time: 1588.61s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
372
+ | end of split 62 / 62 | epoch 6 | time: 1588.70s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
373
+ | end of split 1 / 62 | epoch 7 | time: 1584.79s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
374
+ | end of split 2 / 62 | epoch 7 | time: 1588.80s | valid loss 0.9426 | valid ppl 2.5665 | learning rate 1.2500
375
+ | end of split 3 / 62 | epoch 7 | time: 1589.28s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 0.3125
376
+ | end of split 4 / 62 | epoch 7 | time: 1589.32s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 0.3125
377
+ | end of split 5 / 62 | epoch 7 | time: 1591.86s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 0.3125
378
+ | end of split 6 / 62 | epoch 7 | time: 1590.36s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 0.3125
379
+ | end of split 7 / 62 | epoch 7 | time: 1590.53s | valid loss 0.9423 | valid ppl 2.5659 | learning rate 0.3125
380
+ | end of split 8 / 62 | epoch 7 | time: 1589.81s | valid loss 0.9423 | valid ppl 2.5659 | learning rate 0.3125
381
+ | end of split 9 / 62 | epoch 7 | time: 1590.82s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
382
+ | end of split 10 / 62 | epoch 7 | time: 1591.41s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
383
+ | end of split 11 / 62 | epoch 7 | time: 1592.90s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
384
+ | end of split 12 / 62 | epoch 7 | time: 1594.52s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
385
+ | end of split 13 / 62 | epoch 7 | time: 1592.98s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
386
+ | end of split 14 / 62 | epoch 7 | time: 1591.85s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
387
+ | end of split 15 / 62 | epoch 7 | time: 1593.69s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
388
+ | end of split 16 / 62 | epoch 7 | time: 850.92s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
389
+ | end of split 17 / 62 | epoch 7 | time: 1591.86s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
390
+ | end of split 18 / 62 | epoch 7 | time: 1591.87s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
391
+ | end of split 19 / 62 | epoch 7 | time: 1590.77s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
392
+ | end of split 20 / 62 | epoch 7 | time: 1592.50s | valid loss 0.9422 | valid ppl 2.5657 | learning rate 0.3125
393
+ | end of split 21 / 62 | epoch 7 | time: 1590.69s | valid loss 0.9422 | valid ppl 2.5657 | learning rate 0.0781
394
+ | end of split 22 / 62 | epoch 7 | time: 1588.52s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
395
+ | end of split 23 / 62 | epoch 7 | time: 1591.35s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
396
+ | end of split 24 / 62 | epoch 7 | time: 1592.13s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
397
+ | end of split 25 / 62 | epoch 7 | time: 1590.33s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
398
+ | end of split 26 / 62 | epoch 7 | time: 1593.30s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
399
+ | end of split 27 / 62 | epoch 7 | time: 1591.57s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
400
+ | end of split 28 / 62 | epoch 7 | time: 1590.85s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
401
+ | end of split 29 / 62 | epoch 7 | time: 1591.07s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
402
+ | end of split 30 / 62 | epoch 7 | time: 1589.17s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
403
+ | end of split 31 / 62 | epoch 7 | time: 1590.29s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
404
+ | end of split 32 / 62 | epoch 7 | time: 1588.94s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0195
405
+ | end of split 33 / 62 | epoch 7 | time: 1589.33s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0195
406
+ | end of split 34 / 62 | epoch 7 | time: 1588.78s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
407
+ | end of split 35 / 62 | epoch 7 | time: 1589.30s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
408
+ | end of split 36 / 62 | epoch 7 | time: 1587.55s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
409
+ | end of split 37 / 62 | epoch 7 | time: 1586.43s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
410
+ | end of split 38 / 62 | epoch 7 | time: 1586.62s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
411
+ | end of split 39 / 62 | epoch 7 | time: 1586.33s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
412
+ | end of split 40 / 62 | epoch 7 | time: 1586.73s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
413
+ | end of split 41 / 62 | epoch 7 | time: 1584.33s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
414
+ | end of split 42 / 62 | epoch 7 | time: 1585.00s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
415
+ | end of split 43 / 62 | epoch 7 | time: 1588.09s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
416
+ | end of split 44 / 62 | epoch 7 | time: 1590.56s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
417
+ | end of split 45 / 62 | epoch 7 | time: 1590.53s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0195
418
+ | end of split 46 / 62 | epoch 7 | time: 1595.27s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
419
+ | end of split 47 / 62 | epoch 7 | time: 1599.33s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
420
+ | end of split 48 / 62 | epoch 7 | time: 1598.60s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
421
+ | end of split 49 / 62 | epoch 7 | time: 1598.68s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
422
+ | end of split 50 / 62 | epoch 7 | time: 1600.25s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
423
+ | end of split 51 / 62 | epoch 7 | time: 1597.95s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
424
+ | end of split 52 / 62 | epoch 7 | time: 1598.75s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
425
+ | end of split 53 / 62 | epoch 7 | time: 1599.63s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
426
+ | end of split 54 / 62 | epoch 7 | time: 1594.92s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
427
+ | end of split 55 / 62 | epoch 7 | time: 1595.71s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
428
+ | end of split 56 / 62 | epoch 7 | time: 1597.02s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
429
+ | end of split 57 / 62 | epoch 7 | time: 1594.59s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
430
+ | end of split 58 / 62 | epoch 7 | time: 1593.96s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
431
+ | end of split 59 / 62 | epoch 7 | time: 1594.96s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
432
+ | end of split 60 / 62 | epoch 7 | time: 1594.10s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
433
+ | end of split 61 / 62 | epoch 7 | time: 1595.45s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
434
+ | end of split 62 / 62 | epoch 7 | time: 1597.03s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
435
+ | end of split 1 / 62 | epoch 8 | time: 1593.02s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
436
+ | end of split 2 / 62 | epoch 8 | time: 1598.16s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
437
+ | end of split 3 / 62 | epoch 8 | time: 1598.24s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
438
+ | end of split 4 / 62 | epoch 8 | time: 1600.33s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
439
+ | end of split 5 / 62 | epoch 8 | time: 1598.80s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
440
+ | end of split 6 / 62 | epoch 8 | time: 1599.19s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
441
+ | end of split 7 / 62 | epoch 8 | time: 1599.86s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
442
+ | end of split 8 / 62 | epoch 8 | time: 1597.82s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
443
+ | end of split 9 / 62 | epoch 8 | time: 1597.89s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
444
+ | end of split 10 / 62 | epoch 8 | time: 1596.89s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
445
+ | end of split 11 / 62 | epoch 8 | time: 1596.65s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
446
+ | end of split 12 / 62 | epoch 8 | time: 1593.04s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
447
+ | end of split 13 / 62 | epoch 8 | time: 1584.13s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
448
+ | end of split 14 / 62 | epoch 8 | time: 1581.93s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
449
+ | end of split 15 / 62 | epoch 8 | time: 1579.07s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
450
+ | end of split 16 / 62 | epoch 8 | time: 1580.06s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
451
+ | end of split 17 / 62 | epoch 8 | time: 1580.03s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
452
+ | end of split 18 / 62 | epoch 8 | time: 1580.61s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
453
+ | end of split 19 / 62 | epoch 8 | time: 1579.19s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
454
+ | end of split 20 / 62 | epoch 8 | time: 1579.59s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
455
+ | end of split 21 / 62 | epoch 8 | time: 1577.85s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
456
+ TEST: valid loss 0.9407 | valid ppl 2.5618
pipeline.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Dict
2
+ from flair.models.language_model import LanguageModel
3
+
4
+
5
+ class PreTrainedPipeline:
6
+ def __init__(self, path=""):
7
+ from huggingface_hub import hf_hub_download
8
+
9
+ self.model = LanguageModel.load_language_model(
10
+ hf_hub_download(repo_id="dchaplinsky/flair-uk-backward-large", filename="best-lm.pt")
11
+ )
12
+
13
+ def __call__(self, inputs: str) -> List[Dict]:
14
+ """
15
+ Args:
16
+ inputs (:obj:`str`):
17
+ a string containing some text
18
+ Return:
19
+ A :obj:`str`
20
+ """
21
+ inputs = inputs.strip()
22
+ return [{"generated_text": self.model.generate_text(inputs, temperature=0.5)[0]}]
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ flair