Yura Kuratov
commited on
Commit
•
32b0375
1
Parent(s):
a34e087
update paper links in readme
Browse files
README.md
CHANGED
@@ -17,11 +17,11 @@ GENA-LM (`gena-lm-bert-base-fly`) model is trained with a masked language model
|
|
17 |
- 768 Hidden size
|
18 |
- 32k Vocabulary size
|
19 |
|
20 |
-
We pre-trained `gena-lm-bert-base-fly` using TODO(data). Pre-training was performed for 1,900,000 iterations with batch size 256 and sequence length was equal to 512 tokens. We modified Transformer to use [Pre-Layer normalization](https://arxiv.org/abs/2002.04745).
|
21 |
|
22 |
Source code and data: https://github.com/AIRI-Institute/GENA_LM
|
23 |
|
24 |
-
Paper: https://www.biorxiv.org/content/10.1101/2023.06.12.
|
25 |
|
26 |
## Examples
|
27 |
|
@@ -74,13 +74,14 @@ For evaluation results, see our paper: https://www.biorxiv.org/content/10.1101/2
|
|
74 |
```bibtex
|
75 |
@article{GENA_LM,
|
76 |
author = {Veniamin Fishman and Yuri Kuratov and Maxim Petrov and Aleksei Shmelev and Denis Shepelin and Nikolay Chekanov and Olga Kardymon and Mikhail Burtsev},
|
77 |
-
title = {GENA-LM: A Family of Open-Source Foundational Models for Long
|
78 |
elocation-id = {2023.06.12.544594},
|
79 |
year = {2023},
|
80 |
doi = {10.1101/2023.06.12.544594},
|
81 |
publisher = {Cold Spring Harbor Laboratory},
|
82 |
-
URL = {https://www.biorxiv.org/content/early/2023/
|
83 |
-
eprint = {https://www.biorxiv.org/content/early/2023/
|
84 |
journal = {bioRxiv}
|
85 |
}
|
|
|
86 |
```
|
|
|
17 |
- 768 Hidden size
|
18 |
- 32k Vocabulary size
|
19 |
|
20 |
+
We pre-trained `gena-lm-bert-base-fly` using TODO(data). Pre-training was performed for 1,900,000 iterations with batch size 256 and sequence length was equal to 512 tokens. We modified Transformer to use [Pre-Layer normalization](https://arxiv.org/abs/2002.04745). We upload checkpoint with the best MLM accuracy on validation set.
|
21 |
|
22 |
Source code and data: https://github.com/AIRI-Institute/GENA_LM
|
23 |
|
24 |
+
Paper: https://www.biorxiv.org/content/10.1101/2023.06.12.544594
|
25 |
|
26 |
## Examples
|
27 |
|
|
|
74 |
```bibtex
|
75 |
@article{GENA_LM,
|
76 |
author = {Veniamin Fishman and Yuri Kuratov and Maxim Petrov and Aleksei Shmelev and Denis Shepelin and Nikolay Chekanov and Olga Kardymon and Mikhail Burtsev},
|
77 |
+
title = {GENA-LM: A Family of Open-Source Foundational DNA Language Models for Long Sequences},
|
78 |
elocation-id = {2023.06.12.544594},
|
79 |
year = {2023},
|
80 |
doi = {10.1101/2023.06.12.544594},
|
81 |
publisher = {Cold Spring Harbor Laboratory},
|
82 |
+
URL = {https://www.biorxiv.org/content/early/2023/11/01/2023.06.12.544594},
|
83 |
+
eprint = {https://www.biorxiv.org/content/early/2023/11/01/2023.06.12.544594.full.pdf},
|
84 |
journal = {bioRxiv}
|
85 |
}
|
86 |
+
|
87 |
```
|