vshulev commited on
Commit
016495d
·
verified ·
1 Parent(s): 74db789

Add example usage

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -8,3 +8,26 @@ The model has been trained for a total of 17 epochs.
8
 
9
  The loss curve is shown:
10
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6659d7d2f5106a7f0abeaa3d/6Ypq8hLPW3ssOToGcYHDn.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  The loss curve is shown:
10
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6659d7d2f5106a7f0abeaa3d/6Ypq8hLPW3ssOToGcYHDn.png)
11
+
12
+ ## Example Usage
13
+
14
+ ```
15
+ from transformers import PreTrainedTokenizerFast, BertForMaskedLM
16
+
17
+ model = BertForMaskedLM.from_pretrained("LofiAmazon/BarcodeBERT-Entire-BOLD")
18
+ model.eval()
19
+
20
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("LofiAmazon/BarcodeBERT-Entire-BOLD")
21
+
22
+ # The DNA sequence you want to predict.
23
+ # There should be a space after every 4 characters.
24
+ # The sequence may also have unknown characters which are not A,C,T,G.
25
+ # The maximum DNA sequence length (not counting spaces) should be 660 characters
26
+ dna_sequence = "AACA ATGT ATTT A-T- TTCG CCCT TGTG AATT TATT ..."
27
+
28
+ inputs = tokenizer(dna_sequence, return_tensors="pt")
29
+
30
+ # Obtain a DNA embedding, which is a vector of length 768.
31
+ # The embedding is a representation of this DNA sequence in the model's latent space.
32
+ embedding = model(**inputs).hidden_states[-1].mean(1).squeeze()
33
+ ```