InstaDeepAI
/

segment_nt

Feature Extraction

Model card Files Files and versions Community

hdallatorre commited on Mar 8

Commit

b97aee7

•

1 Parent(s): caa1bc0

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -36,10 +36,10 @@ pip install --upgrade git+https://github.com/huggingface/transformers.git
 A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
 ```
-⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has been
-shown to generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor` argument in
-the config to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference (i.e 6669 for a sequence of 40008 base
-pairs) and `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
 ```
 ```python
 # Load model and tokenizer

 A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
 ```
+⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However,
+Segment-NT-multi-species has been shown to generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change
+the `rescaling_factor` argument in the config to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference
+(i.e 6669 for a sequence of 40008 base pairs) and `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
 ```
 ```python
 # Load model and tokenizer