Add example usage
Browse files
README.md
CHANGED
@@ -8,3 +8,26 @@ The model has been trained for a total of 17 epochs.
|
|
8 |
|
9 |
The loss curve is shown:
|
10 |

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
The loss curve is shown:
|
10 |

|
11 |
+
|
12 |
+
## Example Usage
|
13 |
+
|
14 |
+
```
|
15 |
+
from transformers import PreTrainedTokenizerFast, BertForMaskedLM
|
16 |
+
|
17 |
+
model = BertForMaskedLM.from_pretrained("LofiAmazon/BarcodeBERT-Entire-BOLD")
|
18 |
+
model.eval()
|
19 |
+
|
20 |
+
tokenizer = PreTrainedTokenizerFast.from_pretrained("LofiAmazon/BarcodeBERT-Entire-BOLD")
|
21 |
+
|
22 |
+
# The DNA sequence you want to predict.
|
23 |
+
# There should be a space after every 4 characters.
|
24 |
+
# The sequence may also have unknown characters which are not A,C,T,G.
|
25 |
+
# The maximum DNA sequence length (not counting spaces) should be 660 characters
|
26 |
+
dna_sequence = "AACA ATGT ATTT A-T- TTCG CCCT TGTG AATT TATT ..."
|
27 |
+
|
28 |
+
inputs = tokenizer(dna_sequence, return_tensors="pt")
|
29 |
+
|
30 |
+
# Obtain a DNA embedding, which is a vector of length 768.
|
31 |
+
# The embedding is a representation of this DNA sequence in the model's latent space.
|
32 |
+
embedding = model(**inputs).hidden_states[-1].mean(1).squeeze()
|
33 |
+
```
|