hdallatorre
commited on
Commit
•
941645b
1
Parent(s):
aa82c0b
Update README.md
Browse files
README.md
CHANGED
@@ -35,11 +35,12 @@ pip install --upgrade git+https://github.com/huggingface/transformers.git
|
|
35 |
|
36 |
A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
|
37 |
|
|
|
38 |
⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has been shown to
|
39 |
generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor` argument in the config
|
40 |
to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference (i.e 6669 for a sequence of 40008 base pairs) and
|
41 |
`max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
|
42 |
-
|
43 |
```python
|
44 |
# Load model and tokenizer
|
45 |
from transformers import AutoTokenizer, AutoModel
|
|
|
35 |
|
36 |
A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
|
37 |
|
38 |
+
```
|
39 |
⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has been shown to
|
40 |
generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor` argument in the config
|
41 |
to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference (i.e 6669 for a sequence of 40008 base pairs) and
|
42 |
`max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
|
43 |
+
```
|
44 |
```python
|
45 |
# Load model and tokenizer
|
46 |
from transformers import AutoTokenizer, AutoModel
|