lhallee commited on
Commit
337b42c
1 Parent(s): 70ce9b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -5,6 +5,8 @@ pipeline_tag: text-classification
5
  tags:
6
  - protein language model
7
  widget:
 
 
8
  - text: "M S I N I C R D N H D P F Y R Y K M P P I Q A K V E G R G N G I K T A V L N V A D I S H A L N R P A P Y I V K Y F G F E L G A Q T S I S V D K D R Y L V N G V H E P A K L Q D V L D G F I N K F V L C G S C K N P E T E I I I T K D N D L V R D C K A C G K R T P M D L R H K L S S F I L K N P P D S V S G S K K K K K A A T A S A N V R G G G L S I S D I A Q G K S Q N A P S D G T G S S T P Q H H D E D E D E L S R Q I K A A A S T L E D I E V K D D E W A V D M S E E A I R A R A K E L E V N S E L T Q L D E Y G E W I L E Q A G E D K E N L P S D V E L Y K K A A E L D V L N D P K I G C V L A Q C L F D E D I V N E I A E H N A F F T K I L V T P E Y E K N F M G G I E R F L G L E H K D L I P L L P K I L V Q L Y N N D I I S E E E I M R F G T K S S K K F V P K E V S K K V R R A A K P F I T W L E T A E S D D D E E D D E [SEP] M S I E N L K S F D P F A D T G D D E T A T S N Y I H I R I Q Q R N G R K T L T T V Q G V P E E Y D L K R I L K V L K K D F A C N G N I V K D P E M G E I I Q L Q G D Q R A K V C E F M I S Q L G L Q K K N I K I H G F"
9
  example_title: "Interacting proteins"
10
  ---
@@ -22,12 +24,31 @@ SYNTERACT achieved unprecedented performance over vast phylogeny with 92-96% acc
22
  ## How to use
23
 
24
  ```python
 
 
 
 
 
25
 
 
 
 
 
 
26
 
 
 
 
 
 
27
 
28
- ```
29
-
 
30
 
 
 
 
31
 
32
  ## Intended use and limitations
33
  We define a protein-protein interaction as physical contact that mediates chemical or conformational change, especially with non-generic function. However, due to SYNTERACTS propensity to predict false positives we believe that it identifies plausible conformational changes caused by interactions without relevance to function. Therefore, predictions by SYNTERACT should always be taken with a grain of salt and used as a means of hypothesis generation or secondary validation.
 
5
  tags:
6
  - protein language model
7
  widget:
8
+ - text: "M S H S V K I Y D T C I G C T Q C V R A C P T D V L E M I P W G G C K A K Q I A S A P R T E D C V G C K R C E S A C P T D F L S V R V Y L W H E T T R S M G L A Y [SEP] M I N L P S L F V P L V G L L F P A V A M A S L F L H V E K R L L F S T K K I N"
9
+ example_title: "Non-interacting proteins"
10
  - text: "M S I N I C R D N H D P F Y R Y K M P P I Q A K V E G R G N G I K T A V L N V A D I S H A L N R P A P Y I V K Y F G F E L G A Q T S I S V D K D R Y L V N G V H E P A K L Q D V L D G F I N K F V L C G S C K N P E T E I I I T K D N D L V R D C K A C G K R T P M D L R H K L S S F I L K N P P D S V S G S K K K K K A A T A S A N V R G G G L S I S D I A Q G K S Q N A P S D G T G S S T P Q H H D E D E D E L S R Q I K A A A S T L E D I E V K D D E W A V D M S E E A I R A R A K E L E V N S E L T Q L D E Y G E W I L E Q A G E D K E N L P S D V E L Y K K A A E L D V L N D P K I G C V L A Q C L F D E D I V N E I A E H N A F F T K I L V T P E Y E K N F M G G I E R F L G L E H K D L I P L L P K I L V Q L Y N N D I I S E E E I M R F G T K S S K K F V P K E V S K K V R R A A K P F I T W L E T A E S D D D E E D D E [SEP] M S I E N L K S F D P F A D T G D D E T A T S N Y I H I R I Q Q R N G R K T L T T V Q G V P E E Y D L K R I L K V L K K D F A C N G N I V K D P E M G E I I Q L Q G D Q R A K V C E F M I S Q L G L Q K K N I K I H G F"
11
  example_title: "Interacting proteins"
12
  ---
 
24
  ## How to use
25
 
26
  ```python
27
+ # Imports
28
+ import re
29
+ import torch
30
+ import torch.nn.functional as F
31
+ from transformers import BertForSequenceClassification, BertTokenizer
32
 
33
+ model = BertForSequenceClassification.from_pretrained('lhallee/SYNTERACT') # load model
34
+ tokenizer = BertTokenizer.from_pretrained('lhallee/SYNTERACT') # load tokenizer
35
+ device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # gather device
36
+ model.to(device) # move to device
37
+ model.eval() # put in eval mode
38
 
39
+ sequence_a = 'MEKSCSIGNGREQYGWGHGEQCGTQFLECVYRNASMYSVLGDLITYVVFLGATCYAILFGFRLLLSCVRIVLKVVIALFVIRLLLALGSVDITSVSYSG' # Uniprot A1Z8T3
40
+ sequence_b = 'MRLTLLALIGVLCLACAYALDDSENNDQVVGLLDVADQGANHANDGAREARQLGGWGGGWGGRGGWGGRGGWGGRGGWGGRGGWGGGWGGRGGWGGRGGGWYGR' # Uniprot A1Z8H0
41
+ sequence_a = ' '.join(list(re.sub(r'[UZOB]', 'X', sequence_a))) # need spaces inbetween amino acids
42
+ sequence_b = ' '.join(list(re.sub(r'[UZOB]', 'X', sequence_b))) # replace rare amino acids with X
43
+ example = sequence_a + ' [SEP] ' + sequence_b # add SEP token
44
 
45
+ example = tokenizer(example, return_tensors='pt', padding=False).to(device) # tokenize example
46
+ with torch.no_grad():
47
+ logits = model(**example).logits.cpu().detach() # get logits from model
48
 
49
+ probability = F.softmax(output, dim=-1) # use softmax to get "confidence" in the prediction
50
+ prediction = probability.argmax(dim=-1) # 0 for no interaction, 1 for interaction
51
+ ```
52
 
53
  ## Intended use and limitations
54
  We define a protein-protein interaction as physical contact that mediates chemical or conformational change, especially with non-generic function. However, due to SYNTERACTS propensity to predict false positives we believe that it identifies plausible conformational changes caused by interactions without relevance to function. Therefore, predictions by SYNTERACT should always be taken with a grain of salt and used as a means of hypothesis generation or secondary validation.