Dmitry Chaplinsky
commited on
Commit
•
9c2cbc6
1
Parent(s):
8a60af4
Release
Browse files- README.md +62 -0
- best-lm.pt +3 -0
- flair_dictionary.pkl +3 -0
- loss.txt +456 -0
- pipeline.py +22 -0
- requirements.txt +1 -0
README.md
CHANGED
@@ -1,3 +1,65 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- uk
|
4 |
+
tags:
|
5 |
+
- text2text-generation
|
6 |
+
- flair
|
7 |
+
library_name: generic
|
8 |
license: mit
|
9 |
+
metrics:
|
10 |
+
- perplexity
|
11 |
+
datasets:
|
12 |
+
- ubertext2.0
|
13 |
+
widget:
|
14 |
+
- text: "Росія зазнає поразки"
|
15 |
+
- text: "Достеменно відомо, що Україна перемагає"
|
16 |
---
|
17 |
+
|
18 |
+
# Ukrainian flair embeddings (backward, large)
|
19 |
+
|
20 |
+
Trained for 8 epochs on the texts from ubertext2.0 and corpus of Ukrainian scraped texts from Stefan Schweter (54GB in total).
|
21 |
+
|
22 |
+
This is the **backward** version of the embeddings. You can find the forward version [here](https://huggingface.co/lang-uk/flair-uk-forward-large/)
|
23 |
+
|
24 |
+
The characters dictionary used for training is in `flair_dictionary.pkl` file
|
25 |
+
|
26 |
+
The model params are:
|
27 |
+
```python
|
28 |
+
is_forward_lm=True,
|
29 |
+
hidden_size=2048,
|
30 |
+
sequence_length=250,
|
31 |
+
mini_batch_size=1024,
|
32 |
+
max_epochs=30
|
33 |
+
```
|
34 |
+
|
35 |
+
For smaller size flair embeddings of the Ukrainian language please check [uk-backward](https://huggingface.co/lang-uk/flair-uk-backward)
|
36 |
+
|
37 |
+
For more information on flair embeddings, see [the article](https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) or the paper below:
|
38 |
+
|
39 |
+
```bibtex
|
40 |
+
@inproceedings{akbik2018coling,
|
41 |
+
title={Contextual String Embeddings for Sequence Labeling},
|
42 |
+
author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
|
43 |
+
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
|
44 |
+
pages = {1638--1649},
|
45 |
+
year = {2018}
|
46 |
+
}
|
47 |
+
```
|
48 |
+
|
49 |
+
For more information on UberText 2.0 please see:
|
50 |
+
```bibtex
|
51 |
+
@inproceedings{chaplynskyi-2023-introducing,
|
52 |
+
title = "Introducing {U}ber{T}ext 2.0: A Corpus of {M}odern {U}krainian at Scale",
|
53 |
+
author = "Chaplynskyi, Dmytro",
|
54 |
+
booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
|
55 |
+
month = may,
|
56 |
+
year = "2023",
|
57 |
+
address = "Dubrovnik, Croatia",
|
58 |
+
publisher = "Association for Computational Linguistics",
|
59 |
+
url = "https://aclanthology.org/2023.unlp-1.1",
|
60 |
+
pages = "1--10",
|
61 |
+
abstract = "This paper addresses the need for massive corpora for a low-resource language and presents the publicly available UberText 2.0 corpus for the Ukrainian language and discusses the methodology of its construction. While the collection and maintenance of such a corpus is more of a data extraction and data engineering task, the corpus itself provides a solid foundation for natural language processing tasks. It can enable the creation of contemporary language models and word embeddings, resulting in a better performance of numerous downstream tasks for the Ukrainian language. In addition, the paper and software developed can be used as a guidance and model solution for other low-resource languages. The resulting corpus is available for download on the project page. It has 3.274 billion tokens, consists of 8.59 million texts and takes up 32 gigabytes of space.",
|
62 |
+
}
|
63 |
+
```
|
64 |
+
|
65 |
+
Copyright: [Dmytro Chaplynskyi](https://twitter.com/dchaplinsky), [lang-uk](https://lang.org.ua) project, 2023
|
best-lm.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:42c6ff804b8c6e381764467a736df4d8f37f72b606ca6f8ed689f57cb1d4c3dc
|
3 |
+
size 78734687
|
flair_dictionary.pkl
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2125c32d2db5fb79676a8a6f087b19e9c3b788cb19b87073423e31e176d1fe24
|
3 |
+
size 11900
|
loss.txt
ADDED
@@ -0,0 +1,456 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| end of split 1 / 62 | epoch 1 | time: 1583.48s | valid loss 1.4195 | valid ppl 4.1349 | learning rate 20.0000
|
2 |
+
| end of split 2 / 62 | epoch 1 | time: 1586.99s | valid loss 1.2706 | valid ppl 3.5628 | learning rate 20.0000
|
3 |
+
| end of split 3 / 62 | epoch 1 | time: 1587.17s | valid loss 1.2056 | valid ppl 3.3386 | learning rate 20.0000
|
4 |
+
| end of split 4 / 62 | epoch 1 | time: 1588.13s | valid loss 1.1661 | valid ppl 3.2093 | learning rate 20.0000
|
5 |
+
| end of split 5 / 62 | epoch 1 | time: 1588.33s | valid loss 1.1408 | valid ppl 3.1294 | learning rate 20.0000
|
6 |
+
| end of split 6 / 62 | epoch 1 | time: 1587.62s | valid loss 1.1212 | valid ppl 3.0685 | learning rate 20.0000
|
7 |
+
| end of split 7 / 62 | epoch 1 | time: 1587.56s | valid loss 1.1058 | valid ppl 3.0217 | learning rate 20.0000
|
8 |
+
| end of split 8 / 62 | epoch 1 | time: 1588.13s | valid loss 1.0983 | valid ppl 2.9990 | learning rate 20.0000
|
9 |
+
| end of split 9 / 62 | epoch 1 | time: 1586.70s | valid loss 1.0876 | valid ppl 2.9671 | learning rate 20.0000
|
10 |
+
| end of split 10 / 62 | epoch 1 | time: 1585.61s | valid loss 1.0829 | valid ppl 2.9534 | learning rate 20.0000
|
11 |
+
| end of split 11 / 62 | epoch 1 | time: 1585.50s | valid loss 1.0744 | valid ppl 2.9282 | learning rate 20.0000
|
12 |
+
| end of split 12 / 62 | epoch 1 | time: 1583.26s | valid loss 1.0666 | valid ppl 2.9055 | learning rate 20.0000
|
13 |
+
| end of split 13 / 62 | epoch 1 | time: 1584.36s | valid loss 1.0616 | valid ppl 2.8911 | learning rate 20.0000
|
14 |
+
| end of split 14 / 62 | epoch 1 | time: 1585.50s | valid loss 1.0568 | valid ppl 2.8771 | learning rate 20.0000
|
15 |
+
| end of split 15 / 62 | epoch 1 | time: 1586.30s | valid loss 1.1435 | valid ppl 3.1378 | learning rate 20.0000
|
16 |
+
| end of split 16 / 62 | epoch 1 | time: 1590.72s | valid loss 1.0505 | valid ppl 2.8592 | learning rate 20.0000
|
17 |
+
| end of split 17 / 62 | epoch 1 | time: 1617.21s | valid loss 1.0468 | valid ppl 2.8484 | learning rate 20.0000
|
18 |
+
| end of split 18 / 62 | epoch 1 | time: 1606.50s | valid loss 1.0429 | valid ppl 2.8374 | learning rate 20.0000
|
19 |
+
| end of split 19 / 62 | epoch 1 | time: 1600.44s | valid loss 1.0395 | valid ppl 2.8278 | learning rate 20.0000
|
20 |
+
| end of split 20 / 62 | epoch 1 | time: 1593.91s | valid loss 1.0392 | valid ppl 2.8268 | learning rate 20.0000
|
21 |
+
| end of split 21 / 62 | epoch 1 | time: 1607.71s | valid loss 1.0325 | valid ppl 2.8081 | learning rate 20.0000
|
22 |
+
| end of split 22 / 62 | epoch 1 | time: 1603.04s | valid loss 1.0321 | valid ppl 2.8070 | learning rate 20.0000
|
23 |
+
| end of split 23 / 62 | epoch 1 | time: 1602.89s | valid loss 1.0292 | valid ppl 2.7988 | learning rate 20.0000
|
24 |
+
| end of split 24 / 62 | epoch 1 | time: 1606.15s | valid loss 1.0284 | valid ppl 2.7965 | learning rate 20.0000
|
25 |
+
| end of split 25 / 62 | epoch 1 | time: 1583.05s | valid loss 1.0251 | valid ppl 2.7874 | learning rate 20.0000
|
26 |
+
| end of split 26 / 62 | epoch 1 | time: 1580.57s | valid loss 1.0232 | valid ppl 2.7820 | learning rate 20.0000
|
27 |
+
| end of split 27 / 62 | epoch 1 | time: 1578.17s | valid loss 1.0218 | valid ppl 2.7783 | learning rate 20.0000
|
28 |
+
| end of split 28 / 62 | epoch 1 | time: 1577.71s | valid loss 1.0200 | valid ppl 2.7732 | learning rate 20.0000
|
29 |
+
| end of split 29 / 62 | epoch 1 | time: 1577.12s | valid loss 1.0258 | valid ppl 2.7895 | learning rate 20.0000
|
30 |
+
| end of split 30 / 62 | epoch 1 | time: 1577.09s | valid loss 1.0195 | valid ppl 2.7719 | learning rate 20.0000
|
31 |
+
| end of split 31 / 62 | epoch 1 | time: 1575.70s | valid loss 1.0191 | valid ppl 2.7706 | learning rate 20.0000
|
32 |
+
| end of split 32 / 62 | epoch 1 | time: 1576.02s | valid loss 1.0141 | valid ppl 2.7570 | learning rate 20.0000
|
33 |
+
| end of split 33 / 62 | epoch 1 | time: 1575.11s | valid loss 1.0111 | valid ppl 2.7486 | learning rate 20.0000
|
34 |
+
| end of split 34 / 62 | epoch 1 | time: 1574.68s | valid loss 1.0315 | valid ppl 2.8053 | learning rate 20.0000
|
35 |
+
| end of split 35 / 62 | epoch 1 | time: 1575.54s | valid loss 1.0103 | valid ppl 2.7463 | learning rate 20.0000
|
36 |
+
| end of split 36 / 62 | epoch 1 | time: 1578.17s | valid loss 1.0089 | valid ppl 2.7425 | learning rate 20.0000
|
37 |
+
| end of split 37 / 62 | epoch 1 | time: 1581.60s | valid loss 1.0098 | valid ppl 2.7450 | learning rate 20.0000
|
38 |
+
| end of split 38 / 62 | epoch 1 | time: 1590.23s | valid loss 1.0059 | valid ppl 2.7345 | learning rate 20.0000
|
39 |
+
| end of split 39 / 62 | epoch 1 | time: 1591.84s | valid loss 1.0313 | valid ppl 2.8048 | learning rate 20.0000
|
40 |
+
| end of split 40 / 62 | epoch 1 | time: 1592.79s | valid loss 1.0059 | valid ppl 2.7344 | learning rate 20.0000
|
41 |
+
| end of split 41 / 62 | epoch 1 | time: 1591.62s | valid loss 1.0026 | valid ppl 2.7253 | learning rate 20.0000
|
42 |
+
| end of split 42 / 62 | epoch 1 | time: 1611.75s | valid loss 1.0035 | valid ppl 2.7277 | learning rate 20.0000
|
43 |
+
| end of split 43 / 62 | epoch 1 | time: 1618.56s | valid loss 1.0010 | valid ppl 2.7210 | learning rate 20.0000
|
44 |
+
| end of split 44 / 62 | epoch 1 | time: 1623.11s | valid loss 1.0031 | valid ppl 2.7267 | learning rate 20.0000
|
45 |
+
| end of split 45 / 62 | epoch 1 | time: 1624.39s | valid loss 0.9990 | valid ppl 2.7156 | learning rate 20.0000
|
46 |
+
| end of split 46 / 62 | epoch 1 | time: 1627.72s | valid loss 0.9990 | valid ppl 2.7157 | learning rate 20.0000
|
47 |
+
| end of split 47 / 62 | epoch 1 | time: 1627.58s | valid loss 1.0122 | valid ppl 2.7516 | learning rate 20.0000
|
48 |
+
| end of split 48 / 62 | epoch 1 | time: 1626.44s | valid loss 0.9964 | valid ppl 2.7084 | learning rate 20.0000
|
49 |
+
| end of split 49 / 62 | epoch 1 | time: 1625.87s | valid loss 0.9977 | valid ppl 2.7120 | learning rate 20.0000
|
50 |
+
| end of split 50 / 62 | epoch 1 | time: 1626.88s | valid loss 0.9963 | valid ppl 2.7082 | learning rate 20.0000
|
51 |
+
| end of split 51 / 62 | epoch 1 | time: 1629.08s | valid loss 0.9958 | valid ppl 2.7069 | learning rate 20.0000
|
52 |
+
| end of split 52 / 62 | epoch 1 | time: 1629.12s | valid loss 1.0030 | valid ppl 2.7264 | learning rate 20.0000
|
53 |
+
| end of split 53 / 62 | epoch 1 | time: 1628.87s | valid loss 0.9934 | valid ppl 2.7005 | learning rate 20.0000
|
54 |
+
| end of split 54 / 62 | epoch 1 | time: 1629.78s | valid loss 0.9930 | valid ppl 2.6994 | learning rate 20.0000
|
55 |
+
| end of split 55 / 62 | epoch 1 | time: 1628.40s | valid loss 0.9921 | valid ppl 2.6968 | learning rate 20.0000
|
56 |
+
| end of split 56 / 62 | epoch 1 | time: 1626.37s | valid loss 0.9927 | valid ppl 2.6984 | learning rate 20.0000
|
57 |
+
| end of split 57 / 62 | epoch 1 | time: 1627.36s | valid loss 0.9918 | valid ppl 2.6961 | learning rate 20.0000
|
58 |
+
| end of split 58 / 62 | epoch 1 | time: 1625.21s | valid loss 0.9900 | valid ppl 2.6912 | learning rate 20.0000
|
59 |
+
| end of split 59 / 62 | epoch 1 | time: 1626.91s | valid loss 0.9888 | valid ppl 2.6880 | learning rate 20.0000
|
60 |
+
| end of split 60 / 62 | epoch 1 | time: 1627.73s | valid loss 0.9964 | valid ppl 2.7086 | learning rate 20.0000
|
61 |
+
| end of split 61 / 62 | epoch 1 | time: 1626.02s | valid loss 0.9890 | valid ppl 2.6886 | learning rate 20.0000
|
62 |
+
| end of split 62 / 62 | epoch 1 | time: 869.09s | valid loss 0.9974 | valid ppl 2.7112 | learning rate 20.0000
|
63 |
+
| end of split 1 / 62 | epoch 2 | time: 1622.25s | valid loss 0.9901 | valid ppl 2.6916 | learning rate 20.0000
|
64 |
+
| end of split 2 / 62 | epoch 2 | time: 1625.45s | valid loss 0.9873 | valid ppl 2.6839 | learning rate 20.0000
|
65 |
+
| end of split 3 / 62 | epoch 2 | time: 1623.22s | valid loss 0.9864 | valid ppl 2.6816 | learning rate 20.0000
|
66 |
+
| end of split 4 / 62 | epoch 2 | time: 1623.07s | valid loss 0.9877 | valid ppl 2.6851 | learning rate 20.0000
|
67 |
+
| end of split 5 / 62 | epoch 2 | time: 1620.60s | valid loss 1.0115 | valid ppl 2.7496 | learning rate 20.0000
|
68 |
+
| end of split 6 / 62 | epoch 2 | time: 1622.51s | valid loss 0.9890 | valid ppl 2.6887 | learning rate 20.0000
|
69 |
+
| end of split 7 / 62 | epoch 2 | time: 1620.37s | valid loss 0.9862 | valid ppl 2.6811 | learning rate 20.0000
|
70 |
+
| end of split 8 / 62 | epoch 2 | time: 1620.70s | valid loss 0.9869 | valid ppl 2.6828 | learning rate 20.0000
|
71 |
+
| end of split 9 / 62 | epoch 2 | time: 1619.16s | valid loss 0.9861 | valid ppl 2.6808 | learning rate 20.0000
|
72 |
+
| end of split 10 / 62 | epoch 2 | time: 1617.83s | valid loss 0.9867 | valid ppl 2.6822 | learning rate 20.0000
|
73 |
+
| end of split 11 / 62 | epoch 2 | time: 1618.28s | valid loss 1.0056 | valid ppl 2.7335 | learning rate 20.0000
|
74 |
+
| end of split 12 / 62 | epoch 2 | time: 1615.81s | valid loss 0.9829 | valid ppl 2.6723 | learning rate 20.0000
|
75 |
+
| end of split 13 / 62 | epoch 2 | time: 1615.59s | valid loss 0.9849 | valid ppl 2.6776 | learning rate 20.0000
|
76 |
+
| end of split 14 / 62 | epoch 2 | time: 1616.05s | valid loss 0.9907 | valid ppl 2.6930 | learning rate 20.0000
|
77 |
+
| end of split 15 / 62 | epoch 2 | time: 863.11s | valid loss 0.9904 | valid ppl 2.6922 | learning rate 20.0000
|
78 |
+
| end of split 16 / 62 | epoch 2 | time: 1614.44s | valid loss 0.9823 | valid ppl 2.6705 | learning rate 20.0000
|
79 |
+
| end of split 17 / 62 | epoch 2 | time: 1612.68s | valid loss 0.9824 | valid ppl 2.6708 | learning rate 20.0000
|
80 |
+
| end of split 18 / 62 | epoch 2 | time: 1608.56s | valid loss 0.9810 | valid ppl 2.6670 | learning rate 20.0000
|
81 |
+
| end of split 19 / 62 | epoch 2 | time: 1585.34s | valid loss 0.9799 | valid ppl 2.6641 | learning rate 20.0000
|
82 |
+
| end of split 20 / 62 | epoch 2 | time: 1582.65s | valid loss 0.9801 | valid ppl 2.6647 | learning rate 20.0000
|
83 |
+
| end of split 21 / 62 | epoch 2 | time: 1581.78s | valid loss 0.9804 | valid ppl 2.6656 | learning rate 20.0000
|
84 |
+
| end of split 22 / 62 | epoch 2 | time: 1583.27s | valid loss 0.9791 | valid ppl 2.6620 | learning rate 20.0000
|
85 |
+
| end of split 23 / 62 | epoch 2 | time: 1580.74s | valid loss 0.9780 | valid ppl 2.6590 | learning rate 20.0000
|
86 |
+
| end of split 24 / 62 | epoch 2 | time: 1581.13s | valid loss 0.9782 | valid ppl 2.6597 | learning rate 20.0000
|
87 |
+
| end of split 25 / 62 | epoch 2 | time: 1580.34s | valid loss 0.9795 | valid ppl 2.6631 | learning rate 20.0000
|
88 |
+
| end of split 26 / 62 | epoch 2 | time: 1580.35s | valid loss 0.9782 | valid ppl 2.6597 | learning rate 20.0000
|
89 |
+
| end of split 27 / 62 | epoch 2 | time: 1579.55s | valid loss 0.9780 | valid ppl 2.6592 | learning rate 20.0000
|
90 |
+
| end of split 28 / 62 | epoch 2 | time: 1583.05s | valid loss 0.9850 | valid ppl 2.6778 | learning rate 20.0000
|
91 |
+
| end of split 29 / 62 | epoch 2 | time: 1580.68s | valid loss 0.9822 | valid ppl 2.6702 | learning rate 20.0000
|
92 |
+
| end of split 30 / 62 | epoch 2 | time: 1577.58s | valid loss 0.9923 | valid ppl 2.6973 | learning rate 20.0000
|
93 |
+
| end of split 31 / 62 | epoch 2 | time: 1581.85s | valid loss 0.9764 | valid ppl 2.6550 | learning rate 20.0000
|
94 |
+
| end of split 32 / 62 | epoch 2 | time: 1585.87s | valid loss 0.9760 | valid ppl 2.6537 | learning rate 20.0000
|
95 |
+
| end of split 33 / 62 | epoch 2 | time: 1588.93s | valid loss 0.9758 | valid ppl 2.6533 | learning rate 20.0000
|
96 |
+
| end of split 34 / 62 | epoch 2 | time: 1590.44s | valid loss 0.9759 | valid ppl 2.6536 | learning rate 20.0000
|
97 |
+
| end of split 35 / 62 | epoch 2 | time: 1592.53s | valid loss 0.9758 | valid ppl 2.6532 | learning rate 20.0000
|
98 |
+
| end of split 36 / 62 | epoch 2 | time: 1594.13s | valid loss 0.9758 | valid ppl 2.6532 | learning rate 20.0000
|
99 |
+
| end of split 37 / 62 | epoch 2 | time: 1592.90s | valid loss 0.9737 | valid ppl 2.6476 | learning rate 20.0000
|
100 |
+
| end of split 38 / 62 | epoch 2 | time: 1594.82s | valid loss 0.9736 | valid ppl 2.6474 | learning rate 20.0000
|
101 |
+
| end of split 39 / 62 | epoch 2 | time: 1596.77s | valid loss 0.9754 | valid ppl 2.6521 | learning rate 20.0000
|
102 |
+
| end of split 40 / 62 | epoch 2 | time: 1599.71s | valid loss 0.9753 | valid ppl 2.6520 | learning rate 20.0000
|
103 |
+
| end of split 41 / 62 | epoch 2 | time: 1603.63s | valid loss 0.9745 | valid ppl 2.6498 | learning rate 20.0000
|
104 |
+
| end of split 42 / 62 | epoch 2 | time: 1608.89s | valid loss 0.9734 | valid ppl 2.6470 | learning rate 20.0000
|
105 |
+
| end of split 43 / 62 | epoch 2 | time: 1609.09s | valid loss 0.9725 | valid ppl 2.6445 | learning rate 20.0000
|
106 |
+
| end of split 44 / 62 | epoch 2 | time: 1602.88s | valid loss 0.9728 | valid ppl 2.6454 | learning rate 20.0000
|
107 |
+
| end of split 45 / 62 | epoch 2 | time: 1598.34s | valid loss 0.9721 | valid ppl 2.6434 | learning rate 20.0000
|
108 |
+
| end of split 46 / 62 | epoch 2 | time: 1600.19s | valid loss 0.9719 | valid ppl 2.6430 | learning rate 20.0000
|
109 |
+
| end of split 47 / 62 | epoch 2 | time: 1601.67s | valid loss 0.9719 | valid ppl 2.6431 | learning rate 20.0000
|
110 |
+
| end of split 48 / 62 | epoch 2 | time: 1605.64s | valid loss 0.9719 | valid ppl 2.6428 | learning rate 20.0000
|
111 |
+
| end of split 49 / 62 | epoch 2 | time: 1604.96s | valid loss 0.9710 | valid ppl 2.6406 | learning rate 20.0000
|
112 |
+
| end of split 50 / 62 | epoch 2 | time: 1603.96s | valid loss 0.9715 | valid ppl 2.6420 | learning rate 20.0000
|
113 |
+
| end of split 51 / 62 | epoch 2 | time: 1609.00s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
|
114 |
+
| end of split 52 / 62 | epoch 2 | time: 1609.47s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
|
115 |
+
| end of split 53 / 62 | epoch 2 | time: 1607.14s | valid loss 0.9725 | valid ppl 2.6447 | learning rate 20.0000
|
116 |
+
| end of split 54 / 62 | epoch 2 | time: 1606.27s | valid loss 0.9706 | valid ppl 2.6396 | learning rate 20.0000
|
117 |
+
| end of split 55 / 62 | epoch 2 | time: 1607.85s | valid loss 0.9706 | valid ppl 2.6395 | learning rate 20.0000
|
118 |
+
| end of split 56 / 62 | epoch 2 | time: 1607.99s | valid loss 0.9727 | valid ppl 2.6451 | learning rate 20.0000
|
119 |
+
| end of split 57 / 62 | epoch 2 | time: 1609.15s | valid loss 0.9696 | valid ppl 2.6368 | learning rate 20.0000
|
120 |
+
| end of split 58 / 62 | epoch 2 | time: 1606.21s | valid loss 0.9691 | valid ppl 2.6355 | learning rate 20.0000
|
121 |
+
| end of split 59 / 62 | epoch 2 | time: 1606.97s | valid loss 0.9684 | valid ppl 2.6337 | learning rate 20.0000
|
122 |
+
| end of split 60 / 62 | epoch 2 | time: 1605.30s | valid loss 0.9686 | valid ppl 2.6341 | learning rate 20.0000
|
123 |
+
| end of split 61 / 62 | epoch 2 | time: 1606.09s | valid loss 0.9678 | valid ppl 2.6322 | learning rate 20.0000
|
124 |
+
| end of split 62 / 62 | epoch 2 | time: 1604.24s | valid loss 0.9692 | valid ppl 2.6359 | learning rate 20.0000
|
125 |
+
| end of split 1 / 62 | epoch 3 | time: 1595.63s | valid loss 0.9704 | valid ppl 2.6389 | learning rate 20.0000
|
126 |
+
| end of split 2 / 62 | epoch 3 | time: 1599.02s | valid loss 0.9697 | valid ppl 2.6373 | learning rate 20.0000
|
127 |
+
| end of split 3 / 62 | epoch 3 | time: 1599.83s | valid loss 0.9676 | valid ppl 2.6315 | learning rate 20.0000
|
128 |
+
| end of split 4 / 62 | epoch 3 | time: 1601.68s | valid loss 0.9684 | valid ppl 2.6337 | learning rate 20.0000
|
129 |
+
| end of split 5 / 62 | epoch 3 | time: 1600.81s | valid loss 0.9697 | valid ppl 2.6372 | learning rate 20.0000
|
130 |
+
| end of split 6 / 62 | epoch 3 | time: 1601.85s | valid loss 0.9692 | valid ppl 2.6359 | learning rate 20.0000
|
131 |
+
| end of split 7 / 62 | epoch 3 | time: 1599.16s | valid loss 0.9675 | valid ppl 2.6314 | learning rate 20.0000
|
132 |
+
| end of split 8 / 62 | epoch 3 | time: 1599.83s | valid loss 0.9686 | valid ppl 2.6342 | learning rate 20.0000
|
133 |
+
| end of split 9 / 62 | epoch 3 | time: 1587.43s | valid loss 0.9669 | valid ppl 2.6298 | learning rate 20.0000
|
134 |
+
| end of split 10 / 62 | epoch 3 | time: 1588.81s | valid loss 0.9677 | valid ppl 2.6318 | learning rate 20.0000
|
135 |
+
| end of split 11 / 62 | epoch 3 | time: 1590.43s | valid loss 0.9673 | valid ppl 2.6307 | learning rate 20.0000
|
136 |
+
| end of split 12 / 62 | epoch 3 | time: 1592.90s | valid loss 0.9668 | valid ppl 2.6296 | learning rate 20.0000
|
137 |
+
| end of split 13 / 62 | epoch 3 | time: 1594.36s | valid loss 0.9676 | valid ppl 2.6317 | learning rate 20.0000
|
138 |
+
| end of split 14 / 62 | epoch 3 | time: 1595.81s | valid loss 0.9652 | valid ppl 2.6254 | learning rate 20.0000
|
139 |
+
| end of split 15 / 62 | epoch 3 | time: 1596.70s | valid loss 0.9659 | valid ppl 2.6271 | learning rate 20.0000
|
140 |
+
| end of split 16 / 62 | epoch 3 | time: 1591.94s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
|
141 |
+
| end of split 17 / 62 | epoch 3 | time: 1584.49s | valid loss 0.9656 | valid ppl 2.6262 | learning rate 20.0000
|
142 |
+
| end of split 18 / 62 | epoch 3 | time: 1585.57s | valid loss 0.9649 | valid ppl 2.6245 | learning rate 20.0000
|
143 |
+
| end of split 19 / 62 | epoch 3 | time: 1579.95s | valid loss 0.9650 | valid ppl 2.6248 | learning rate 20.0000
|
144 |
+
| end of split 20 / 62 | epoch 3 | time: 843.60s | valid loss 0.9738 | valid ppl 2.6480 | learning rate 20.0000
|
145 |
+
| end of split 21 / 62 | epoch 3 | time: 1580.19s | valid loss 0.9780 | valid ppl 2.6592 | learning rate 20.0000
|
146 |
+
| end of split 22 / 62 | epoch 3 | time: 1582.17s | valid loss 1.0091 | valid ppl 2.7433 | learning rate 20.0000
|
147 |
+
| end of split 23 / 62 | epoch 3 | time: 1582.31s | valid loss 0.9639 | valid ppl 2.6220 | learning rate 20.0000
|
148 |
+
| end of split 24 / 62 | epoch 3 | time: 1582.57s | valid loss 0.9828 | valid ppl 2.6720 | learning rate 20.0000
|
149 |
+
| end of split 25 / 62 | epoch 3 | time: 1582.46s | valid loss 0.9636 | valid ppl 2.6210 | learning rate 20.0000
|
150 |
+
| end of split 26 / 62 | epoch 3 | time: 1585.02s | valid loss 0.9653 | valid ppl 2.6255 | learning rate 20.0000
|
151 |
+
| end of split 27 / 62 | epoch 3 | time: 1584.48s | valid loss 0.9638 | valid ppl 2.6216 | learning rate 20.0000
|
152 |
+
| end of split 28 / 62 | epoch 3 | time: 1585.97s | valid loss 0.9641 | valid ppl 2.6225 | learning rate 20.0000
|
153 |
+
| end of split 29 / 62 | epoch 3 | time: 1588.62s | valid loss 0.9630 | valid ppl 2.6195 | learning rate 20.0000
|
154 |
+
| end of split 30 / 62 | epoch 3 | time: 1605.99s | valid loss 0.9626 | valid ppl 2.6184 | learning rate 20.0000
|
155 |
+
| end of split 31 / 62 | epoch 3 | time: 1627.59s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
|
156 |
+
| end of split 32 / 62 | epoch 3 | time: 1600.91s | valid loss 0.9649 | valid ppl 2.6246 | learning rate 20.0000
|
157 |
+
| end of split 33 / 62 | epoch 3 | time: 1607.37s | valid loss 0.9635 | valid ppl 2.6208 | learning rate 20.0000
|
158 |
+
| end of split 34 / 62 | epoch 3 | time: 1605.43s | valid loss 0.9619 | valid ppl 2.6166 | learning rate 20.0000
|
159 |
+
| end of split 35 / 62 | epoch 3 | time: 1606.13s | valid loss 0.9621 | valid ppl 2.6173 | learning rate 20.0000
|
160 |
+
| end of split 36 / 62 | epoch 3 | time: 1604.60s | valid loss 0.9622 | valid ppl 2.6175 | learning rate 20.0000
|
161 |
+
| end of split 37 / 62 | epoch 3 | time: 1606.96s | valid loss 0.9620 | valid ppl 2.6170 | learning rate 20.0000
|
162 |
+
| end of split 38 / 62 | epoch 3 | time: 1604.31s | valid loss 0.9615 | valid ppl 2.6157 | learning rate 20.0000
|
163 |
+
| end of split 39 / 62 | epoch 3 | time: 1603.46s | valid loss 0.9618 | valid ppl 2.6165 | learning rate 20.0000
|
164 |
+
| end of split 40 / 62 | epoch 3 | time: 1602.53s | valid loss 0.9613 | valid ppl 2.6151 | learning rate 20.0000
|
165 |
+
| end of split 41 / 62 | epoch 3 | time: 1602.02s | valid loss 0.9613 | valid ppl 2.6151 | learning rate 20.0000
|
166 |
+
| end of split 42 / 62 | epoch 3 | time: 1601.02s | valid loss 0.9618 | valid ppl 2.6165 | learning rate 20.0000
|
167 |
+
| end of split 43 / 62 | epoch 3 | time: 1602.13s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
|
168 |
+
| end of split 44 / 62 | epoch 3 | time: 1605.32s | valid loss 0.9612 | valid ppl 2.6149 | learning rate 20.0000
|
169 |
+
| end of split 45 / 62 | epoch 3 | time: 1607.04s | valid loss 0.9808 | valid ppl 2.6667 | learning rate 20.0000
|
170 |
+
| end of split 46 / 62 | epoch 3 | time: 1600.96s | valid loss 0.9597 | valid ppl 2.6108 | learning rate 20.0000
|
171 |
+
| end of split 47 / 62 | epoch 3 | time: 1602.97s | valid loss 0.9597 | valid ppl 2.6109 | learning rate 20.0000
|
172 |
+
| end of split 48 / 62 | epoch 3 | time: 1600.73s | valid loss 0.9657 | valid ppl 2.6267 | learning rate 20.0000
|
173 |
+
| end of split 49 / 62 | epoch 3 | time: 1601.65s | valid loss 0.9614 | valid ppl 2.6154 | learning rate 20.0000
|
174 |
+
| end of split 50 / 62 | epoch 3 | time: 1601.78s | valid loss 0.9603 | valid ppl 2.6124 | learning rate 20.0000
|
175 |
+
| end of split 51 / 62 | epoch 3 | time: 1601.02s | valid loss 0.9593 | valid ppl 2.6098 | learning rate 20.0000
|
176 |
+
| end of split 52 / 62 | epoch 3 | time: 1600.92s | valid loss 0.9607 | valid ppl 2.6136 | learning rate 20.0000
|
177 |
+
| end of split 53 / 62 | epoch 3 | time: 1601.95s | valid loss 0.9604 | valid ppl 2.6127 | learning rate 20.0000
|
178 |
+
| end of split 54 / 62 | epoch 3 | time: 1600.51s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
|
179 |
+
| end of split 55 / 62 | epoch 3 | time: 1599.14s | valid loss 0.9588 | valid ppl 2.6086 | learning rate 20.0000
|
180 |
+
| end of split 56 / 62 | epoch 3 | time: 1599.72s | valid loss 0.9602 | valid ppl 2.6123 | learning rate 20.0000
|
181 |
+
| end of split 57 / 62 | epoch 3 | time: 1597.65s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
|
182 |
+
| end of split 58 / 62 | epoch 3 | time: 1598.97s | valid loss 0.9593 | valid ppl 2.6098 | learning rate 20.0000
|
183 |
+
| end of split 59 / 62 | epoch 3 | time: 1601.81s | valid loss 0.9589 | valid ppl 2.6089 | learning rate 20.0000
|
184 |
+
| end of split 60 / 62 | epoch 3 | time: 1600.21s | valid loss 0.9599 | valid ppl 2.6115 | learning rate 20.0000
|
185 |
+
| end of split 61 / 62 | epoch 3 | time: 1598.25s | valid loss 0.9595 | valid ppl 2.6103 | learning rate 20.0000
|
186 |
+
| end of split 62 / 62 | epoch 3 | time: 1600.01s | valid loss 0.9584 | valid ppl 2.6075 | learning rate 20.0000
|
187 |
+
| end of split 1 / 62 | epoch 4 | time: 1595.62s | valid loss 0.9586 | valid ppl 2.6081 | learning rate 20.0000
|
188 |
+
| end of split 2 / 62 | epoch 4 | time: 1593.94s | valid loss 0.9598 | valid ppl 2.6110 | learning rate 20.0000
|
189 |
+
| end of split 3 / 62 | epoch 4 | time: 1595.86s | valid loss 0.9592 | valid ppl 2.6096 | learning rate 20.0000
|
190 |
+
| end of split 4 / 62 | epoch 4 | time: 852.38s | valid loss 0.9646 | valid ppl 2.6237 | learning rate 20.0000
|
191 |
+
| end of split 5 / 62 | epoch 4 | time: 1596.43s | valid loss 0.9590 | valid ppl 2.6091 | learning rate 20.0000
|
192 |
+
| end of split 6 / 62 | epoch 4 | time: 1594.53s | valid loss 0.9584 | valid ppl 2.6075 | learning rate 20.0000
|
193 |
+
| end of split 7 / 62 | epoch 4 | time: 1594.97s | valid loss 0.9573 | valid ppl 2.6046 | learning rate 20.0000
|
194 |
+
| end of split 8 / 62 | epoch 4 | time: 1594.23s | valid loss 0.9579 | valid ppl 2.6062 | learning rate 20.0000
|
195 |
+
| end of split 9 / 62 | epoch 4 | time: 1594.24s | valid loss 0.9580 | valid ppl 2.6065 | learning rate 20.0000
|
196 |
+
| end of split 10 / 62 | epoch 4 | time: 1591.80s | valid loss 0.9578 | valid ppl 2.6059 | learning rate 20.0000
|
197 |
+
| end of split 11 / 62 | epoch 4 | time: 1580.82s | valid loss 0.9572 | valid ppl 2.6043 | learning rate 20.0000
|
198 |
+
| end of split 12 / 62 | epoch 4 | time: 1578.60s | valid loss 0.9580 | valid ppl 2.6064 | learning rate 20.0000
|
199 |
+
| end of split 13 / 62 | epoch 4 | time: 1580.22s | valid loss 0.9585 | valid ppl 2.6079 | learning rate 20.0000
|
200 |
+
| end of split 14 / 62 | epoch 4 | time: 1578.77s | valid loss 0.9627 | valid ppl 2.6189 | learning rate 20.0000
|
201 |
+
| end of split 15 / 62 | epoch 4 | time: 1579.05s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
|
202 |
+
| end of split 16 / 62 | epoch 4 | time: 1577.56s | valid loss 0.9568 | valid ppl 2.6034 | learning rate 20.0000
|
203 |
+
| end of split 17 / 62 | epoch 4 | time: 1578.26s | valid loss 0.9572 | valid ppl 2.6044 | learning rate 20.0000
|
204 |
+
| end of split 18 / 62 | epoch 4 | time: 1579.21s | valid loss 0.9566 | valid ppl 2.6027 | learning rate 20.0000
|
205 |
+
| end of split 19 / 62 | epoch 4 | time: 1578.77s | valid loss 0.9567 | valid ppl 2.6030 | learning rate 20.0000
|
206 |
+
| end of split 20 / 62 | epoch 4 | time: 1576.14s | valid loss 0.9584 | valid ppl 2.6076 | learning rate 20.0000
|
207 |
+
| end of split 21 / 62 | epoch 4 | time: 1576.68s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
|
208 |
+
| end of split 22 / 62 | epoch 4 | time: 1576.80s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
|
209 |
+
| end of split 23 / 62 | epoch 4 | time: 1576.23s | valid loss 0.9744 | valid ppl 2.6496 | learning rate 20.0000
|
210 |
+
| end of split 24 / 62 | epoch 4 | time: 1575.49s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
|
211 |
+
| end of split 25 / 62 | epoch 4 | time: 1577.44s | valid loss 0.9555 | valid ppl 2.6000 | learning rate 20.0000
|
212 |
+
| end of split 26 / 62 | epoch 4 | time: 1577.10s | valid loss 0.9564 | valid ppl 2.6024 | learning rate 20.0000
|
213 |
+
| end of split 27 / 62 | epoch 4 | time: 1576.83s | valid loss 0.9560 | valid ppl 2.6012 | learning rate 20.0000
|
214 |
+
| end of split 28 / 62 | epoch 4 | time: 1588.94s | valid loss 0.9567 | valid ppl 2.6031 | learning rate 20.0000
|
215 |
+
| end of split 29 / 62 | epoch 4 | time: 1591.83s | valid loss 0.9554 | valid ppl 2.5996 | learning rate 20.0000
|
216 |
+
| end of split 30 / 62 | epoch 4 | time: 1603.93s | valid loss 0.9554 | valid ppl 2.5997 | learning rate 20.0000
|
217 |
+
| end of split 31 / 62 | epoch 4 | time: 1595.76s | valid loss 0.9549 | valid ppl 2.5985 | learning rate 20.0000
|
218 |
+
| end of split 32 / 62 | epoch 4 | time: 1711.81s | valid loss 0.9561 | valid ppl 2.6015 | learning rate 20.0000
|
219 |
+
| end of split 33 / 62 | epoch 4 | time: 1577.07s | valid loss 0.9577 | valid ppl 2.6058 | learning rate 20.0000
|
220 |
+
| end of split 34 / 62 | epoch 4 | time: 1576.41s | valid loss 0.9546 | valid ppl 2.5978 | learning rate 20.0000
|
221 |
+
| end of split 35 / 62 | epoch 4 | time: 1577.72s | valid loss 0.9552 | valid ppl 2.5991 | learning rate 20.0000
|
222 |
+
| end of split 36 / 62 | epoch 4 | time: 1577.03s | valid loss 0.9553 | valid ppl 2.5995 | learning rate 20.0000
|
223 |
+
| end of split 37 / 62 | epoch 4 | time: 1578.71s | valid loss 0.9544 | valid ppl 2.5972 | learning rate 20.0000
|
224 |
+
| end of split 38 / 62 | epoch 4 | time: 1577.03s | valid loss 0.9559 | valid ppl 2.6011 | learning rate 20.0000
|
225 |
+
| end of split 39 / 62 | epoch 4 | time: 1630.11s | valid loss 0.9540 | valid ppl 2.5962 | learning rate 20.0000
|
226 |
+
| end of split 40 / 62 | epoch 4 | time: 1579.09s | valid loss 0.9558 | valid ppl 2.6007 | learning rate 20.0000
|
227 |
+
| end of split 41 / 62 | epoch 4 | time: 1578.58s | valid loss 0.9538 | valid ppl 2.5956 | learning rate 20.0000
|
228 |
+
| end of split 42 / 62 | epoch 4 | time: 1579.44s | valid loss 0.9541 | valid ppl 2.5964 | learning rate 20.0000
|
229 |
+
| end of split 43 / 62 | epoch 4 | time: 1577.04s | valid loss 0.9544 | valid ppl 2.5971 | learning rate 20.0000
|
230 |
+
| end of split 44 / 62 | epoch 4 | time: 1576.88s | valid loss 0.9544 | valid ppl 2.5972 | learning rate 20.0000
|
231 |
+
| end of split 45 / 62 | epoch 4 | time: 1578.62s | valid loss 0.9600 | valid ppl 2.6116 | learning rate 20.0000
|
232 |
+
| end of split 46 / 62 | epoch 4 | time: 1577.25s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
|
233 |
+
| end of split 47 / 62 | epoch 4 | time: 1577.78s | valid loss 0.9554 | valid ppl 2.5996 | learning rate 20.0000
|
234 |
+
| end of split 48 / 62 | epoch 4 | time: 1577.99s | valid loss 0.9545 | valid ppl 2.5974 | learning rate 20.0000
|
235 |
+
| end of split 49 / 62 | epoch 4 | time: 1575.73s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 20.0000
|
236 |
+
| end of split 50 / 62 | epoch 4 | time: 1574.23s | valid loss 0.9535 | valid ppl 2.5947 | learning rate 20.0000
|
237 |
+
| end of split 51 / 62 | epoch 4 | time: 1575.99s | valid loss 0.9623 | valid ppl 2.6176 | learning rate 20.0000
|
238 |
+
| end of split 52 / 62 | epoch 4 | time: 1575.37s | valid loss 0.9954 | valid ppl 2.7058 | learning rate 20.0000
|
239 |
+
| end of split 53 / 62 | epoch 4 | time: 1574.08s | valid loss 0.9561 | valid ppl 2.6014 | learning rate 20.0000
|
240 |
+
| end of split 54 / 62 | epoch 4 | time: 1575.32s | valid loss 0.9543 | valid ppl 2.5968 | learning rate 20.0000
|
241 |
+
| end of split 55 / 62 | epoch 4 | time: 1575.06s | valid loss 0.9541 | valid ppl 2.5962 | learning rate 20.0000
|
242 |
+
| end of split 56 / 62 | epoch 4 | time: 1575.80s | valid loss 0.9713 | valid ppl 2.6413 | learning rate 20.0000
|
243 |
+
| end of split 57 / 62 | epoch 4 | time: 1577.19s | valid loss 0.9556 | valid ppl 2.6003 | learning rate 20.0000
|
244 |
+
| end of split 58 / 62 | epoch 4 | time: 1576.21s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
|
245 |
+
| end of split 59 / 62 | epoch 4 | time: 1577.08s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
|
246 |
+
| end of split 60 / 62 | epoch 4 | time: 1574.14s | valid loss 0.9572 | valid ppl 2.6045 | learning rate 20.0000
|
247 |
+
| end of split 61 / 62 | epoch 4 | time: 1571.90s | valid loss 0.9549 | valid ppl 2.5984 | learning rate 20.0000
|
248 |
+
| end of split 62 / 62 | epoch 4 | time: 1572.26s | valid loss 0.9482 | valid ppl 2.5811 | learning rate 5.0000
|
249 |
+
| end of split 1 / 62 | epoch 5 | time: 1570.96s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
|
250 |
+
| end of split 2 / 62 | epoch 5 | time: 1573.43s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
|
251 |
+
| end of split 3 / 62 | epoch 5 | time: 1573.08s | valid loss 0.9472 | valid ppl 2.5786 | learning rate 5.0000
|
252 |
+
| end of split 4 / 62 | epoch 5 | time: 1572.80s | valid loss 0.9474 | valid ppl 2.5789 | learning rate 5.0000
|
253 |
+
| end of split 5 / 62 | epoch 5 | time: 1572.50s | valid loss 0.9477 | valid ppl 2.5798 | learning rate 5.0000
|
254 |
+
| end of split 6 / 62 | epoch 5 | time: 1574.27s | valid loss 0.9469 | valid ppl 2.5777 | learning rate 5.0000
|
255 |
+
| end of split 7 / 62 | epoch 5 | time: 1575.64s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
|
256 |
+
| end of split 8 / 62 | epoch 5 | time: 1577.81s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
|
257 |
+
| end of split 9 / 62 | epoch 5 | time: 1578.61s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
|
258 |
+
| end of split 10 / 62 | epoch 5 | time: 1580.32s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
|
259 |
+
| end of split 11 / 62 | epoch 5 | time: 1581.85s | valid loss 0.9467 | valid ppl 2.5771 | learning rate 5.0000
|
260 |
+
| end of split 12 / 62 | epoch 5 | time: 1582.22s | valid loss 0.9466 | valid ppl 2.5769 | learning rate 5.0000
|
261 |
+
| end of split 13 / 62 | epoch 5 | time: 1581.45s | valid loss 0.9466 | valid ppl 2.5769 | learning rate 5.0000
|
262 |
+
| end of split 14 / 62 | epoch 5 | time: 1579.73s | valid loss 0.9466 | valid ppl 2.5770 | learning rate 5.0000
|
263 |
+
| end of split 15 / 62 | epoch 5 | time: 1581.60s | valid loss 0.9466 | valid ppl 2.5768 | learning rate 5.0000
|
264 |
+
| end of split 16 / 62 | epoch 5 | time: 1577.02s | valid loss 0.9463 | valid ppl 2.5761 | learning rate 5.0000
|
265 |
+
| end of split 17 / 62 | epoch 5 | time: 1576.46s | valid loss 0.9465 | valid ppl 2.5768 | learning rate 5.0000
|
266 |
+
| end of split 18 / 62 | epoch 5 | time: 1577.82s | valid loss 0.9472 | valid ppl 2.5785 | learning rate 5.0000
|
267 |
+
| end of split 19 / 62 | epoch 5 | time: 1579.10s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
|
268 |
+
| end of split 20 / 62 | epoch 5 | time: 1579.00s | valid loss 0.9462 | valid ppl 2.5760 | learning rate 5.0000
|
269 |
+
| end of split 21 / 62 | epoch 5 | time: 1579.61s | valid loss 0.9461 | valid ppl 2.5757 | learning rate 5.0000
|
270 |
+
| end of split 22 / 62 | epoch 5 | time: 1580.98s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
|
271 |
+
| end of split 23 / 62 | epoch 5 | time: 1581.08s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
|
272 |
+
| end of split 24 / 62 | epoch 5 | time: 1581.18s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
|
273 |
+
| end of split 25 / 62 | epoch 5 | time: 1579.63s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
|
274 |
+
| end of split 26 / 62 | epoch 5 | time: 1584.07s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
|
275 |
+
| end of split 27 / 62 | epoch 5 | time: 1595.88s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
|
276 |
+
| end of split 28 / 62 | epoch 5 | time: 1594.85s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
|
277 |
+
| end of split 29 / 62 | epoch 5 | time: 1592.49s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
|
278 |
+
| end of split 30 / 62 | epoch 5 | time: 1592.88s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
|
279 |
+
| end of split 31 / 62 | epoch 5 | time: 1595.11s | valid loss 0.9458 | valid ppl 2.5747 | learning rate 5.0000
|
280 |
+
| end of split 32 / 62 | epoch 5 | time: 1596.27s | valid loss 0.9458 | valid ppl 2.5748 | learning rate 5.0000
|
281 |
+
| end of split 33 / 62 | epoch 5 | time: 1593.21s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
|
282 |
+
| end of split 34 / 62 | epoch 5 | time: 1594.40s | valid loss 0.9457 | valid ppl 2.5746 | learning rate 5.0000
|
283 |
+
| end of split 35 / 62 | epoch 5 | time: 1590.87s | valid loss 0.9455 | valid ppl 2.5741 | learning rate 5.0000
|
284 |
+
| end of split 36 / 62 | epoch 5 | time: 1593.79s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
|
285 |
+
| end of split 37 / 62 | epoch 5 | time: 1591.50s | valid loss 0.9456 | valid ppl 2.5745 | learning rate 5.0000
|
286 |
+
| end of split 38 / 62 | epoch 5 | time: 1589.49s | valid loss 0.9457 | valid ppl 2.5745 | learning rate 5.0000
|
287 |
+
| end of split 39 / 62 | epoch 5 | time: 1590.75s | valid loss 0.9480 | valid ppl 2.5806 | learning rate 5.0000
|
288 |
+
| end of split 40 / 62 | epoch 5 | time: 1590.43s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
|
289 |
+
| end of split 41 / 62 | epoch 5 | time: 1590.08s | valid loss 0.9455 | valid ppl 2.5741 | learning rate 5.0000
|
290 |
+
| end of split 42 / 62 | epoch 5 | time: 1589.48s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
|
291 |
+
| end of split 43 / 62 | epoch 5 | time: 1587.62s | valid loss 0.9457 | valid ppl 2.5745 | learning rate 5.0000
|
292 |
+
| end of split 44 / 62 | epoch 5 | time: 1586.79s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
|
293 |
+
| end of split 45 / 62 | epoch 5 | time: 1585.86s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
|
294 |
+
| end of split 46 / 62 | epoch 5 | time: 1586.95s | valid loss 0.9454 | valid ppl 2.5738 | learning rate 5.0000
|
295 |
+
| end of split 47 / 62 | epoch 5 | time: 1587.96s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
|
296 |
+
| end of split 48 / 62 | epoch 5 | time: 1587.28s | valid loss 0.9455 | valid ppl 2.5740 | learning rate 5.0000
|
297 |
+
| end of split 49 / 62 | epoch 5 | time: 1587.77s | valid loss 0.9451 | valid ppl 2.5732 | learning rate 5.0000
|
298 |
+
| end of split 50 / 62 | epoch 5 | time: 1586.98s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
|
299 |
+
| end of split 51 / 62 | epoch 5 | time: 1585.51s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
|
300 |
+
| end of split 52 / 62 | epoch 5 | time: 1586.57s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
|
301 |
+
| end of split 53 / 62 | epoch 5 | time: 1586.75s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
|
302 |
+
| end of split 54 / 62 | epoch 5 | time: 846.84s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
|
303 |
+
| end of split 55 / 62 | epoch 5 | time: 1583.94s | valid loss 0.9451 | valid ppl 2.5730 | learning rate 5.0000
|
304 |
+
| end of split 56 / 62 | epoch 5 | time: 1585.75s | valid loss 0.9451 | valid ppl 2.5732 | learning rate 5.0000
|
305 |
+
| end of split 57 / 62 | epoch 5 | time: 1585.81s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
|
306 |
+
| end of split 58 / 62 | epoch 5 | time: 1586.18s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
|
307 |
+
| end of split 59 / 62 | epoch 5 | time: 1586.85s | valid loss 0.9449 | valid ppl 2.5725 | learning rate 5.0000
|
308 |
+
| end of split 60 / 62 | epoch 5 | time: 1591.84s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
|
309 |
+
| end of split 61 / 62 | epoch 5 | time: 1592.74s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
|
310 |
+
| end of split 62 / 62 | epoch 5 | time: 1595.38s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
|
311 |
+
| end of split 1 / 62 | epoch 6 | time: 1594.09s | valid loss 0.9448 | valid ppl 2.5724 | learning rate 5.0000
|
312 |
+
| end of split 2 / 62 | epoch 6 | time: 1598.24s | valid loss 0.9448 | valid ppl 2.5723 | learning rate 5.0000
|
313 |
+
| end of split 3 / 62 | epoch 6 | time: 1598.85s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
|
314 |
+
| end of split 4 / 62 | epoch 6 | time: 1593.37s | valid loss 0.9448 | valid ppl 2.5723 | learning rate 5.0000
|
315 |
+
| end of split 5 / 62 | epoch 6 | time: 1586.31s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
|
316 |
+
| end of split 6 / 62 | epoch 6 | time: 1586.36s | valid loss 0.9446 | valid ppl 2.5718 | learning rate 5.0000
|
317 |
+
| end of split 7 / 62 | epoch 6 | time: 1584.08s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
|
318 |
+
| end of split 8 / 62 | epoch 6 | time: 1584.49s | valid loss 0.9445 | valid ppl 2.5716 | learning rate 5.0000
|
319 |
+
| end of split 9 / 62 | epoch 6 | time: 1583.63s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
|
320 |
+
| end of split 10 / 62 | epoch 6 | time: 1582.25s | valid loss 0.9446 | valid ppl 2.5718 | learning rate 5.0000
|
321 |
+
| end of split 11 / 62 | epoch 6 | time: 1583.67s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
|
322 |
+
| end of split 12 / 62 | epoch 6 | time: 1592.91s | valid loss 0.9445 | valid ppl 2.5715 | learning rate 5.0000
|
323 |
+
| end of split 13 / 62 | epoch 6 | time: 1591.67s | valid loss 0.9445 | valid ppl 2.5716 | learning rate 5.0000
|
324 |
+
| end of split 14 / 62 | epoch 6 | time: 1593.32s | valid loss 0.9444 | valid ppl 2.5712 | learning rate 5.0000
|
325 |
+
| end of split 15 / 62 | epoch 6 | time: 1595.18s | valid loss 0.9444 | valid ppl 2.5714 | learning rate 5.0000
|
326 |
+
| end of split 16 / 62 | epoch 6 | time: 1595.10s | valid loss 0.9447 | valid ppl 2.5719 | learning rate 5.0000
|
327 |
+
| end of split 17 / 62 | epoch 6 | time: 1595.70s | valid loss 0.9444 | valid ppl 2.5711 | learning rate 5.0000
|
328 |
+
| end of split 18 / 62 | epoch 6 | time: 1593.68s | valid loss 0.9444 | valid ppl 2.5713 | learning rate 5.0000
|
329 |
+
| end of split 19 / 62 | epoch 6 | time: 1595.28s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
|
330 |
+
| end of split 20 / 62 | epoch 6 | time: 1595.01s | valid loss 0.9475 | valid ppl 2.5793 | learning rate 5.0000
|
331 |
+
| end of split 21 / 62 | epoch 6 | time: 1594.95s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
|
332 |
+
| end of split 22 / 62 | epoch 6 | time: 1595.46s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
|
333 |
+
| end of split 23 / 62 | epoch 6 | time: 1597.41s | valid loss 0.9442 | valid ppl 2.5708 | learning rate 5.0000
|
334 |
+
| end of split 24 / 62 | epoch 6 | time: 1597.13s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
|
335 |
+
| end of split 25 / 62 | epoch 6 | time: 1595.18s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
|
336 |
+
| end of split 26 / 62 | epoch 6 | time: 1594.01s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
|
337 |
+
| end of split 27 / 62 | epoch 6 | time: 1594.84s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
|
338 |
+
| end of split 28 / 62 | epoch 6 | time: 1592.94s | valid loss 0.9441 | valid ppl 2.5705 | learning rate 5.0000
|
339 |
+
| end of split 29 / 62 | epoch 6 | time: 1591.38s | valid loss 0.9443 | valid ppl 2.5711 | learning rate 5.0000
|
340 |
+
| end of split 30 / 62 | epoch 6 | time: 1590.34s | valid loss 0.9442 | valid ppl 2.5707 | learning rate 5.0000
|
341 |
+
| end of split 31 / 62 | epoch 6 | time: 1592.84s | valid loss 0.9441 | valid ppl 2.5704 | learning rate 5.0000
|
342 |
+
| end of split 32 / 62 | epoch 6 | time: 1589.97s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
|
343 |
+
| end of split 33 / 62 | epoch 6 | time: 1589.48s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
|
344 |
+
| end of split 34 / 62 | epoch 6 | time: 1590.99s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
|
345 |
+
| end of split 35 / 62 | epoch 6 | time: 1587.27s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
|
346 |
+
| end of split 36 / 62 | epoch 6 | time: 1589.43s | valid loss 0.9433 | valid ppl 2.5683 | learning rate 1.2500
|
347 |
+
| end of split 37 / 62 | epoch 6 | time: 1590.89s | valid loss 0.9431 | valid ppl 2.5680 | learning rate 1.2500
|
348 |
+
| end of split 38 / 62 | epoch 6 | time: 1591.30s | valid loss 0.9431 | valid ppl 2.5679 | learning rate 1.2500
|
349 |
+
| end of split 39 / 62 | epoch 6 | time: 1587.59s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
|
350 |
+
| end of split 40 / 62 | epoch 6 | time: 1589.99s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
|
351 |
+
| end of split 41 / 62 | epoch 6 | time: 848.87s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
|
352 |
+
| end of split 42 / 62 | epoch 6 | time: 1589.92s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
|
353 |
+
| end of split 43 / 62 | epoch 6 | time: 1588.08s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
|
354 |
+
| end of split 44 / 62 | epoch 6 | time: 1586.96s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
|
355 |
+
| end of split 45 / 62 | epoch 6 | time: 1587.55s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
|
356 |
+
| end of split 46 / 62 | epoch 6 | time: 1586.69s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
|
357 |
+
| end of split 47 / 62 | epoch 6 | time: 1587.20s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
|
358 |
+
| end of split 48 / 62 | epoch 6 | time: 1587.64s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
|
359 |
+
| end of split 49 / 62 | epoch 6 | time: 1579.53s | valid loss 0.9427 | valid ppl 2.5670 | learning rate 1.2500
|
360 |
+
| end of split 50 / 62 | epoch 6 | time: 1577.89s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
|
361 |
+
| end of split 51 / 62 | epoch 6 | time: 1574.78s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
|
362 |
+
| end of split 52 / 62 | epoch 6 | time: 1575.34s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
|
363 |
+
| end of split 53 / 62 | epoch 6 | time: 1574.50s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
|
364 |
+
| end of split 54 / 62 | epoch 6 | time: 1578.06s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
|
365 |
+
| end of split 55 / 62 | epoch 6 | time: 1577.22s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
|
366 |
+
| end of split 56 / 62 | epoch 6 | time: 1577.40s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
|
367 |
+
| end of split 57 / 62 | epoch 6 | time: 1579.42s | valid loss 0.9426 | valid ppl 2.5668 | learning rate 1.2500
|
368 |
+
| end of split 58 / 62 | epoch 6 | time: 1575.45s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
369 |
+
| end of split 59 / 62 | epoch 6 | time: 1577.22s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
370 |
+
| end of split 60 / 62 | epoch 6 | time: 1582.29s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
|
371 |
+
| end of split 61 / 62 | epoch 6 | time: 1588.61s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
|
372 |
+
| end of split 62 / 62 | epoch 6 | time: 1588.70s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
|
373 |
+
| end of split 1 / 62 | epoch 7 | time: 1584.79s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
|
374 |
+
| end of split 2 / 62 | epoch 7 | time: 1588.80s | valid loss 0.9426 | valid ppl 2.5665 | learning rate 1.2500
|
375 |
+
| end of split 3 / 62 | epoch 7 | time: 1589.28s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 0.3125
|
376 |
+
| end of split 4 / 62 | epoch 7 | time: 1589.32s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 0.3125
|
377 |
+
| end of split 5 / 62 | epoch 7 | time: 1591.86s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 0.3125
|
378 |
+
| end of split 6 / 62 | epoch 7 | time: 1590.36s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 0.3125
|
379 |
+
| end of split 7 / 62 | epoch 7 | time: 1590.53s | valid loss 0.9423 | valid ppl 2.5659 | learning rate 0.3125
|
380 |
+
| end of split 8 / 62 | epoch 7 | time: 1589.81s | valid loss 0.9423 | valid ppl 2.5659 | learning rate 0.3125
|
381 |
+
| end of split 9 / 62 | epoch 7 | time: 1590.82s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
382 |
+
| end of split 10 / 62 | epoch 7 | time: 1591.41s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
383 |
+
| end of split 11 / 62 | epoch 7 | time: 1592.90s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
384 |
+
| end of split 12 / 62 | epoch 7 | time: 1594.52s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
385 |
+
| end of split 13 / 62 | epoch 7 | time: 1592.98s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
386 |
+
| end of split 14 / 62 | epoch 7 | time: 1591.85s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
387 |
+
| end of split 15 / 62 | epoch 7 | time: 1593.69s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
388 |
+
| end of split 16 / 62 | epoch 7 | time: 850.92s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
389 |
+
| end of split 17 / 62 | epoch 7 | time: 1591.86s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
390 |
+
| end of split 18 / 62 | epoch 7 | time: 1591.87s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
391 |
+
| end of split 19 / 62 | epoch 7 | time: 1590.77s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
|
392 |
+
| end of split 20 / 62 | epoch 7 | time: 1592.50s | valid loss 0.9422 | valid ppl 2.5657 | learning rate 0.3125
|
393 |
+
| end of split 21 / 62 | epoch 7 | time: 1590.69s | valid loss 0.9422 | valid ppl 2.5657 | learning rate 0.0781
|
394 |
+
| end of split 22 / 62 | epoch 7 | time: 1588.52s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
395 |
+
| end of split 23 / 62 | epoch 7 | time: 1591.35s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
396 |
+
| end of split 24 / 62 | epoch 7 | time: 1592.13s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
397 |
+
| end of split 25 / 62 | epoch 7 | time: 1590.33s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
398 |
+
| end of split 26 / 62 | epoch 7 | time: 1593.30s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
399 |
+
| end of split 27 / 62 | epoch 7 | time: 1591.57s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
400 |
+
| end of split 28 / 62 | epoch 7 | time: 1590.85s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
401 |
+
| end of split 29 / 62 | epoch 7 | time: 1591.07s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
402 |
+
| end of split 30 / 62 | epoch 7 | time: 1589.17s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
403 |
+
| end of split 31 / 62 | epoch 7 | time: 1590.29s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
|
404 |
+
| end of split 32 / 62 | epoch 7 | time: 1588.94s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0195
|
405 |
+
| end of split 33 / 62 | epoch 7 | time: 1589.33s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0195
|
406 |
+
| end of split 34 / 62 | epoch 7 | time: 1588.78s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
407 |
+
| end of split 35 / 62 | epoch 7 | time: 1589.30s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
408 |
+
| end of split 36 / 62 | epoch 7 | time: 1587.55s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
409 |
+
| end of split 37 / 62 | epoch 7 | time: 1586.43s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
410 |
+
| end of split 38 / 62 | epoch 7 | time: 1586.62s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
411 |
+
| end of split 39 / 62 | epoch 7 | time: 1586.33s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
412 |
+
| end of split 40 / 62 | epoch 7 | time: 1586.73s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
413 |
+
| end of split 41 / 62 | epoch 7 | time: 1584.33s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
414 |
+
| end of split 42 / 62 | epoch 7 | time: 1585.00s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
415 |
+
| end of split 43 / 62 | epoch 7 | time: 1588.09s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
416 |
+
| end of split 44 / 62 | epoch 7 | time: 1590.56s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
|
417 |
+
| end of split 45 / 62 | epoch 7 | time: 1590.53s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0195
|
418 |
+
| end of split 46 / 62 | epoch 7 | time: 1595.27s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
419 |
+
| end of split 47 / 62 | epoch 7 | time: 1599.33s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
420 |
+
| end of split 48 / 62 | epoch 7 | time: 1598.60s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
421 |
+
| end of split 49 / 62 | epoch 7 | time: 1598.68s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
422 |
+
| end of split 50 / 62 | epoch 7 | time: 1600.25s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
423 |
+
| end of split 51 / 62 | epoch 7 | time: 1597.95s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
424 |
+
| end of split 52 / 62 | epoch 7 | time: 1598.75s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
425 |
+
| end of split 53 / 62 | epoch 7 | time: 1599.63s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
426 |
+
| end of split 54 / 62 | epoch 7 | time: 1594.92s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
427 |
+
| end of split 55 / 62 | epoch 7 | time: 1595.71s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
428 |
+
| end of split 56 / 62 | epoch 7 | time: 1597.02s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
|
429 |
+
| end of split 57 / 62 | epoch 7 | time: 1594.59s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
430 |
+
| end of split 58 / 62 | epoch 7 | time: 1593.96s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
431 |
+
| end of split 59 / 62 | epoch 7 | time: 1594.96s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
432 |
+
| end of split 60 / 62 | epoch 7 | time: 1594.10s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
433 |
+
| end of split 61 / 62 | epoch 7 | time: 1595.45s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
434 |
+
| end of split 62 / 62 | epoch 7 | time: 1597.03s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
435 |
+
| end of split 1 / 62 | epoch 8 | time: 1593.02s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
436 |
+
| end of split 2 / 62 | epoch 8 | time: 1598.16s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
437 |
+
| end of split 3 / 62 | epoch 8 | time: 1598.24s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
438 |
+
| end of split 4 / 62 | epoch 8 | time: 1600.33s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
439 |
+
| end of split 5 / 62 | epoch 8 | time: 1598.80s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
|
440 |
+
| end of split 6 / 62 | epoch 8 | time: 1599.19s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
441 |
+
| end of split 7 / 62 | epoch 8 | time: 1599.86s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
442 |
+
| end of split 8 / 62 | epoch 8 | time: 1597.82s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
443 |
+
| end of split 9 / 62 | epoch 8 | time: 1597.89s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
444 |
+
| end of split 10 / 62 | epoch 8 | time: 1596.89s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
445 |
+
| end of split 11 / 62 | epoch 8 | time: 1596.65s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
446 |
+
| end of split 12 / 62 | epoch 8 | time: 1593.04s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
447 |
+
| end of split 13 / 62 | epoch 8 | time: 1584.13s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
448 |
+
| end of split 14 / 62 | epoch 8 | time: 1581.93s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
449 |
+
| end of split 15 / 62 | epoch 8 | time: 1579.07s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
450 |
+
| end of split 16 / 62 | epoch 8 | time: 1580.06s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
|
451 |
+
| end of split 17 / 62 | epoch 8 | time: 1580.03s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
|
452 |
+
| end of split 18 / 62 | epoch 8 | time: 1580.61s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
|
453 |
+
| end of split 19 / 62 | epoch 8 | time: 1579.19s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
|
454 |
+
| end of split 20 / 62 | epoch 8 | time: 1579.59s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
|
455 |
+
| end of split 21 / 62 | epoch 8 | time: 1577.85s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
|
456 |
+
TEST: valid loss 0.9407 | valid ppl 2.5618
|
pipeline.py
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import List, Dict
|
2 |
+
from flair.models.language_model import LanguageModel
|
3 |
+
|
4 |
+
|
5 |
+
class PreTrainedPipeline:
|
6 |
+
def __init__(self, path=""):
|
7 |
+
from huggingface_hub import hf_hub_download
|
8 |
+
|
9 |
+
self.model = LanguageModel.load_language_model(
|
10 |
+
hf_hub_download(repo_id="dchaplinsky/flair-uk-backward-large", filename="best-lm.pt")
|
11 |
+
)
|
12 |
+
|
13 |
+
def __call__(self, inputs: str) -> List[Dict]:
|
14 |
+
"""
|
15 |
+
Args:
|
16 |
+
inputs (:obj:`str`):
|
17 |
+
a string containing some text
|
18 |
+
Return:
|
19 |
+
A :obj:`str`
|
20 |
+
"""
|
21 |
+
inputs = inputs.strip()
|
22 |
+
return [{"generated_text": self.model.generate_text(inputs, temperature=0.5)[0]}]
|
requirements.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
flair
|