neuralbioinfo
/

prokbert-mini-long-phage

@@ -11,9 +11,9 @@ tags:
 - promoter-prediction
 - phage
 ---
-## ProkBERT-mini-pahge Model
-This finetuned model is specifically designed for promoter identification and is based on the [ProkBERT-mini model](https://huggingface.co/neuralbioinfo/prokbert-mini).
 For more details, refer to the [pahge dataset description](https://huggingface.co/datasets/neuralbioinfo/phage-test-10k) used for training and evaluating this model.
@@ -37,9 +37,9 @@ The following example demonstrates how to use the ProkBERT-mini-promoter model f
 ```python
 from prokbert.prokbert_tokenizer import ProkBERTTokenizer
 from transformers import MegatronBertForSequenceClassification
-finetuned_model = "neuralbioinfo/prokbert-mini-phage"
 kmer = 6
-shift= 1
 tok_params = {'kmer' : kmer,
              'shift' : shift}
@@ -61,18 +61,19 @@ print(outputs)
 **Architecture:**
 ...
-**Tokenizer:** The model uses a 6-mer tokenizer with a shift of 1 (k6s1), specifically designed to handle DNA sequences efficiently.
 **Parameters:**
 | Parameter            | Description                          |
 |----------------------|--------------------------------------|
-| Model Size           | 20.6 million parameters              |
-| Max. Context Size    | 1024 bp                              |
 | Training Data        | 206.65 billion nucleotides           |
 | Layers               | 6                                    |
 | Attention Heads      | 6                                    |
 ### Intended Use
 **Intended Use Cases:** ProkBERT-mini-phage is intended for bioinformatics researchers and practitioners focusing on genomic sequence analysis, including:

 - promoter-prediction
 - phage
 ---
+## ProkBERT-mini-long-phage Model
+This finetuned model is specifically designed for promoter identification and is based on the [ProkBERT-mini-long model](https://huggingface.co/neuralbioinfo/prokbert-mini-long).
 For more details, refer to the [pahge dataset description](https://huggingface.co/datasets/neuralbioinfo/phage-test-10k) used for training and evaluating this model.
 ```python
 from prokbert.prokbert_tokenizer import ProkBERTTokenizer
 from transformers import MegatronBertForSequenceClassification
+finetuned_model = "neuralbioinfo/prokbert-mini-long-phage"
 kmer = 6
+shift= 2
 tok_params = {'kmer' : kmer,
              'shift' : shift}
 **Architecture:**
 ...
+**Tokenizer:** The model uses a 6-mer tokenizer with a shift of 2 (k6s2), specifically designed to handle DNA sequences efficiently.
 **Parameters:**
 | Parameter            | Description                          |
 |----------------------|--------------------------------------|
+| Model Size           | 26.6 million parameters              |
+| Max. Context Size    | 4096 bp                              |
 | Training Data        | 206.65 billion nucleotides           |
 | Layers               | 6                                    |
 | Attention Heads      | 6                                    |
 ### Intended Use
 **Intended Use Cases:** ProkBERT-mini-phage is intended for bioinformatics researchers and practitioners focusing on genomic sequence analysis, including: