Simon Duerr commited on
Commit
00aa807
1 Parent(s): 0291496

add proteinmpnn

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. ProteinMPNN +0 -1
  2. ProteinMPNN/LICENSE +21 -0
  3. ProteinMPNN/README.md +111 -0
  4. ProteinMPNN/ca_model_weights/v_48_002.pt +3 -0
  5. ProteinMPNN/ca_model_weights/v_48_010.pt +3 -0
  6. ProteinMPNN/ca_model_weights/v_48_020.pt +3 -0
  7. ProteinMPNN/colab_notebooks/README.md +1 -0
  8. ProteinMPNN/colab_notebooks/ca_only_quickdemo.ipynb +0 -0
  9. ProteinMPNN/colab_notebooks/quickdemo.ipynb +0 -0
  10. ProteinMPNN/colab_notebooks/quickdemo_wAF2.ipynb +612 -0
  11. ProteinMPNN/examples/submit_example_1.sh +28 -0
  12. ProteinMPNN/examples/submit_example_2.sh +34 -0
  13. ProteinMPNN/examples/submit_example_3.sh +27 -0
  14. ProteinMPNN/examples/submit_example_3_score_only.sh +28 -0
  15. ProteinMPNN/examples/submit_example_3_score_only_from_fasta.sh +30 -0
  16. ProteinMPNN/examples/submit_example_4.sh +40 -0
  17. ProteinMPNN/examples/submit_example_4_non_fixed.sh +40 -0
  18. ProteinMPNN/examples/submit_example_5.sh +44 -0
  19. ProteinMPNN/examples/submit_example_6.sh +34 -0
  20. ProteinMPNN/examples/submit_example_7.sh +29 -0
  21. ProteinMPNN/examples/submit_example_8.sh +34 -0
  22. ProteinMPNN/examples/submit_example_pssm.sh +49 -0
  23. ProteinMPNN/helper_scripts/assign_fixed_chains.py +39 -0
  24. ProteinMPNN/helper_scripts/make_bias_AA.py +27 -0
  25. ProteinMPNN/helper_scripts/make_bias_per_res_dict.py +53 -0
  26. ProteinMPNN/helper_scripts/make_fixed_positions_dict.py +59 -0
  27. ProteinMPNN/helper_scripts/make_pos_neg_tied_positions_dict.py +73 -0
  28. ProteinMPNN/helper_scripts/make_pssm_input_dict.py +36 -0
  29. ProteinMPNN/helper_scripts/make_tied_positions_dict.py +61 -0
  30. ProteinMPNN/helper_scripts/other_tools/make_omit_AA.py +39 -0
  31. ProteinMPNN/helper_scripts/other_tools/make_pssm_dict.py +64 -0
  32. ProteinMPNN/helper_scripts/parse_multiple_chains.out +1 -0
  33. ProteinMPNN/helper_scripts/parse_multiple_chains.py +163 -0
  34. ProteinMPNN/helper_scripts/parse_multiple_chains.sh +7 -0
  35. ProteinMPNN/inputs/PDB_complexes/pdbs/3HTN.pdb +0 -0
  36. ProteinMPNN/inputs/PDB_complexes/pdbs/4YOW.pdb +0 -0
  37. ProteinMPNN/inputs/PDB_homooligomers/pdbs/4GYT.pdb +0 -0
  38. ProteinMPNN/inputs/PDB_homooligomers/pdbs/6EHB.pdb +0 -0
  39. ProteinMPNN/inputs/PDB_monomers/pdbs/5L33.pdb +0 -0
  40. ProteinMPNN/inputs/PDB_monomers/pdbs/6MRR.pdb +0 -0
  41. ProteinMPNN/inputs/PSSM_inputs/3HTN.npz +0 -0
  42. ProteinMPNN/inputs/PSSM_inputs/4YOW.npz +0 -0
  43. ProteinMPNN/outputs/example_1_outputs/parsed_pdbs.jsonl +2 -0
  44. ProteinMPNN/outputs/example_1_outputs/seqs/5L33.fa +6 -0
  45. ProteinMPNN/outputs/example_1_outputs/seqs/6MRR.fa +6 -0
  46. ProteinMPNN/outputs/example_2_outputs/assigned_pdbs.jsonl +1 -0
  47. ProteinMPNN/outputs/example_2_outputs/parsed_pdbs.jsonl +0 -0
  48. ProteinMPNN/outputs/example_2_outputs/seqs/3HTN.fa +6 -0
  49. ProteinMPNN/outputs/example_2_outputs/seqs/4YOW.fa +6 -0
  50. ProteinMPNN/outputs/example_3_outputs/seqs/3HTN.fa +6 -0
ProteinMPNN DELETED
@@ -1 +0,0 @@
1
- Subproject commit 8907e6671bfbfc92303b5f79c4b5e6ce47cdef57
 
 
ProteinMPNN/LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2022 Justas Dauparas
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
ProteinMPNN/README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ProteinMPNN
2
+ ![ProteinMPNN](https://docs.google.com/drawings/d/e/2PACX-1vTtnMBDOq8TpHIctUfGN8Vl32x5ISNcPKlxjcQJF2q70PlaH2uFlj2Ac4s3khnZqG1YxppdMr0iTyk-/pub?w=889&h=358)
3
+ Read [ProteinMPNN paper](https://www.biorxiv.org/content/10.1101/2022.06.03.494563v1).
4
+
5
+ To run ProteinMPNN clone this github repo and install Python>=3.0, PyTorch, Numpy.
6
+
7
+ Full protein backbone models: `vanilla_model_weights/v_48_002.pt, v_48_010.pt, v_48_020.pt, v_48_030.pt`, `soluble_model_weights/v_48_010.pt, v_48_020.pt`.
8
+
9
+ CA only models: `ca_model_weights/v_48_002.pt, v_48_010.pt, v_48_020.pt`. Enable flag `--ca_only` to use these models.
10
+
11
+ Helper scripts: `helper_scripts` - helper functions to parse PDBs, assign which chains to design, which residues to fix, adding AA bias, tying residues etc.
12
+
13
+ Code organization:
14
+ * `protein_mpnn_run.py` - the main script to initialialize and run the model.
15
+ * `protein_mpnn_utils.py` - utility functions for the main script.
16
+ * `examples/` - simple code examples.
17
+ * `inputs/` - input PDB files for examples
18
+ * `outputs/` - outputs from examples
19
+ * `colab_notebooks/` - Google Colab examples
20
+ * `training/` - code and data to retrain the model
21
+ -----------------------------------------------------------------------------------------------------
22
+ Input flags for `protein_mpnn_run.py`:
23
+ ```
24
+ argparser.add_argument("--suppress_print", type=int, default=0, help="0 for False, 1 for True")
25
+ argparser.add_argument("--ca_only", action="store_true", default=False, help="Parse CA-only structures and use CA-only models (default: false)")
26
+ argparser.add_argument("--path_to_model_weights", type=str, default="", help="Path to model weights folder;")
27
+ argparser.add_argument("--model_name", type=str, default="v_48_020", help="ProteinMPNN model name: v_48_002, v_48_010, v_48_020, v_48_030; v_48_010=version with 48 edges 0.10A noise")
28
+ argparser.add_argument("--use_soluble_model", action="store_true", default=False, help="Flag to load ProteinMPNN weights trained on soluble proteins only.")
29
+ argparser.add_argument("--seed", type=int, default=0, help="If set to 0 then a random seed will be picked;")
30
+ argparser.add_argument("--save_score", type=int, default=0, help="0 for False, 1 for True; save score=-log_prob to npy files")
31
+ argparser.add_argument("--path_to_fasta", type=str, default="", help="score provided input sequence in a fasta format; e.g. GGGGGG/PPPPS/WWW for chains A, B, C sorted alphabetically and separated by /")
32
+ argparser.add_argument("--save_probs", type=int, default=0, help="0 for False, 1 for True; save MPNN predicted probabilites per position")
33
+ argparser.add_argument("--score_only", type=int, default=0, help="0 for False, 1 for True; score input backbone-sequence pairs")
34
+ argparser.add_argument("--conditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)")
35
+ argparser.add_argument("--conditional_probs_only_backbone", type=int, default=0, help="0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)")
36
+ argparser.add_argument("--unconditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output unconditional probabilities p(s_i given backbone) in one forward pass")
37
+ argparser.add_argument("--backbone_noise", type=float, default=0.00, help="Standard deviation of Gaussian noise to add to backbone atoms")
38
+ argparser.add_argument("--num_seq_per_target", type=int, default=1, help="Number of sequences to generate per target")
39
+ argparser.add_argument("--batch_size", type=int, default=1, help="Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory")
40
+ argparser.add_argument("--max_length", type=int, default=200000, help="Max sequence length")
41
+ argparser.add_argument("--sampling_temp", type=str, default="0.1", help="A string of temperatures, 0.2 0.25 0.5. Sampling temperature for amino acids. Suggested values 0.1, 0.15, 0.2, 0.25, 0.3. Higher values will lead to more diversity.")
42
+ argparser.add_argument("--out_folder", type=str, help="Path to a folder to output sequences, e.g. /home/out/")
43
+ argparser.add_argument("--pdb_path", type=str, default='', help="Path to a single PDB to be designed")
44
+ argparser.add_argument("--pdb_path_chains", type=str, default='', help="Define which chains need to be designed for a single PDB ")
45
+ argparser.add_argument("--jsonl_path", type=str, help="Path to a folder with parsed pdb into jsonl")
46
+ argparser.add_argument("--chain_id_jsonl",type=str, default='', help="Path to a dictionary specifying which chains need to be designed and which ones are fixed, if not specied all chains will be designed.")
47
+ argparser.add_argument("--fixed_positions_jsonl", type=str, default='', help="Path to a dictionary with fixed positions")
48
+ argparser.add_argument("--omit_AAs", type=list, default='X', help="Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine.")
49
+ argparser.add_argument("--bias_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies AA composion bias if neededi, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely.")
50
+ argparser.add_argument("--bias_by_res_jsonl", default='', help="Path to dictionary with per position bias.")
51
+ argparser.add_argument("--omit_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies which amino acids need to be omited from design at specific chain indices")
52
+ argparser.add_argument("--pssm_jsonl", type=str, default='', help="Path to a dictionary with pssm")
53
+ argparser.add_argument("--pssm_multi", type=float, default=0.0, help="A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions")
54
+ argparser.add_argument("--pssm_threshold", type=float, default=0.0, help="A value between -inf + inf to restric per position AAs")
55
+ argparser.add_argument("--pssm_log_odds_flag", type=int, default=0, help="0 for False, 1 for True")
56
+ argparser.add_argument("--pssm_bias_flag", type=int, default=0, help="0 for False, 1 for True")
57
+ argparser.add_argument("--tied_positions_jsonl", type=str, default='', help="Path to a dictionary with tied positions")
58
+
59
+ ```
60
+ -----------------------------------------------------------------------------------------------------
61
+ For example to make a conda environment to run ProteinMPNN:
62
+ * `conda create --name mlfold` - this creates conda environment called `mlfold`
63
+ * `source activate mlfold` - this activate environment
64
+ * `conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch` - install pytorch following steps from https://pytorch.org/
65
+ -----------------------------------------------------------------------------------------------------
66
+ These are provided `examples/`:
67
+ * `submit_example_1.sh` - simple monomer example
68
+ * `submit_example_2.sh` - simple multi-chain example
69
+ * `submit_example_3.sh` - directly from the .pdb path
70
+ * `submit_example_3_score_only.sh` - return score only (model's uncertainty)
71
+ * `submit_example_3_score_only_from_fasta.sh` - return score only (model's uncertainty) loading sequence from fasta files
72
+ * `submit_example_4.sh` - fix some residue positions
73
+ * `submit_example_4_non_fixed.sh` - specify which positions to design
74
+ * `submit_example_5.sh` - tie some positions together (symmetry)
75
+ * `submit_example_6.sh` - homooligomer example
76
+ * `submit_example_7.sh` - return sequence unconditional probabilities (PSSM like)
77
+ * `submit_example_8.sh` - add amino acid bias
78
+ * `submit_example_pssm.sh` - use PSSM bias when designing sequences
79
+ -----------------------------------------------------------------------------------------------------
80
+ Output example:
81
+ ```
82
+ >3HTN, score=1.1705, global_score=1.2045, fixed_chains=['B'], designed_chains=['A', 'C'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
83
+ NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER/NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER
84
+ >T=0.1, sample=1, score=0.7291, global_score=0.9330, seq_recovery=0.5736
85
+ NMYSYKKIGNKYIVSINNHTEIVKALKKFCEEKNIKSGSVNGIGSIGSVTLKFYNLETKEEELKTFNANFEISNLTGFISMHDNKVFLDLHITIGDENFSALAGHLVSAVVNGTCELIVEDFNELVSTKYNEELGLWLLDFEK/NMYSYKKIGNKYIVSINNHTDIVTAIKKFCEDKKIKSGTINGIGQVKEVTLEFRNFETGEKEEKTFKKQFTISNLTGFISTKDGKVFLDLHITFGDENFSALAGHLISAIVDGKCELIIEDYNEEINVKYNEELGLYLLDFNK
86
+ >T=0.1, sample=2, score=0.7414, global_score=0.9355, seq_recovery=0.6075
87
+ NMYKYKKIGNKYIVSINNHTEIVKAIKEFCKEKNIKSGTINGIGQVGKVTLRFYNPETKEYTEKTFNDNFEISNLTGFISTYKNEVFLHLHITFGKSDFSALAGHLLSAIVNGICELIVEDFKENLSMKYDEKTGLYLLDFEK/NMYKYKKIGNKYVVSINNHTEIVEALKAFCEDKKIKSGTVNGIGQVSKVTLKFFNIETKESKEKTFNKNFEISNLTGFISEINGEVFLHLHITIGDENFSALAGHLLSAVVNGEAILIVEDYKEKVNRKYNEELGLNLLDFNL
88
+ ```
89
+ * `score` - average over residues that were designed negative log probability of sampled amino acids
90
+ * `global score` - average over all residues in all chains negative log probability of sampled/fixed amino acids
91
+ * `fixed_chains` - chains that were not designed (fixed)
92
+ * `designed_chains` - chains that were redesigned
93
+ * `model_name/CA_model_name` - model name that was used to generate results, e.g. `v_48_020`
94
+ * `git_hash` - github version that was used to generate outputs
95
+ * `seed` - random seed
96
+ * `T=0.1` - temperature equal to 0.1 was used to sample sequences
97
+ * `sample` - sequence sample number 1, 2, 3...etc
98
+ -----------------------------------------------------------------------------------------------------
99
+ ```
100
+ @article{dauparas2022robust,
101
+ title={Robust deep learning--based protein sequence design using ProteinMPNN},
102
+ author={Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
103
+ journal={Science},
104
+ volume={378},
105
+ number={6615},
106
+ pages={49--56},
107
+ year={2022},
108
+ publisher={American Association for the Advancement of Science}
109
+ }
110
+ ```
111
+ -----------------------------------------------------------------------------------------------------
ProteinMPNN/ca_model_weights/v_48_002.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec038b44a987d7c8351b6ed887c82a2370d54e45e55a6bdaf508a729cef0340e
3
+ size 6624011
ProteinMPNN/ca_model_weights/v_48_010.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdb50498d45578d20b271fa7817b8cd8bfde3875ad69dbd3f5e4b5dd3e588301
3
+ size 6624011
ProteinMPNN/ca_model_weights/v_48_020.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f28f40170e21858c5ff31ef50b6e63414ff76dc331b19f85aa8586a12031744a
3
+ size 6624011
ProteinMPNN/colab_notebooks/README.md ADDED
@@ -0,0 +1 @@
 
 
1
+ <a href="https://colab.research.google.com/github/dauparas/ProteinMPNN/blob/main/colab_notebooks/quickdemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
ProteinMPNN/colab_notebooks/ca_only_quickdemo.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/colab_notebooks/quickdemo.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/colab_notebooks/quickdemo_wAF2.ipynb ADDED
@@ -0,0 +1,612 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "view-in-github",
7
+ "colab_type": "text"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/dauparas/ProteinMPNN/blob/main/colab_notebooks/quickdemo_wAF2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "AYZebfKn8gef"
17
+ },
18
+ "source": [
19
+ "#ProteinMPNN w/AF2\n",
20
+ "This notebook is intended as a quick demo, more features to come!\n",
21
+ "\n",
22
+ "Examples: \n",
23
+ "1. pdb: `6MRR`, homomer: `False`, designed_chain: `A`\n",
24
+ "2. pdb: `1X2I`, homomer: `True`, designed_chain: `A,B` \n",
25
+ " (for correct symmetric tying lenghts of homomer chains should be the same)"
26
+ ]
27
+ },
28
+ {
29
+ "cell_type": "code",
30
+ "source": [
31
+ "#@title Setup ProteinMPNN\n",
32
+ "import warnings\n",
33
+ "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
34
+ "\n",
35
+ "import json, time, os, sys, glob, re\n",
36
+ "from google.colab import files\n",
37
+ "import numpy as np\n",
38
+ "\n",
39
+ "if not os.path.isdir(\"ProteinMPNN\"):\n",
40
+ " os.system(\"git clone -q https://github.com/dauparas/ProteinMPNN.git\")\n",
41
+ "\n",
42
+ "if \"ProteinMPNN\" not in sys.path:\n",
43
+ " sys.path.append('/content/ProteinMPNN')\n",
44
+ "\n",
45
+ "import matplotlib.pyplot as plt\n",
46
+ "import shutil\n",
47
+ "import warnings\n",
48
+ "import torch\n",
49
+ "from torch import optim\n",
50
+ "from torch.utils.data import DataLoader\n",
51
+ "from torch.utils.data.dataset import random_split, Subset\n",
52
+ "import copy\n",
53
+ "import torch.nn as nn\n",
54
+ "import torch.nn.functional as F\n",
55
+ "import random\n",
56
+ "import os.path\n",
57
+ "from protein_mpnn_utils import loss_nll, loss_smoothed, gather_edges, gather_nodes, gather_nodes_t, cat_neighbors_nodes, _scores, _S_to_seq, tied_featurize, parse_PDB\n",
58
+ "from protein_mpnn_utils import StructureDataset, StructureDatasetPDB, ProteinMPNN\n",
59
+ "\n",
60
+ "device = torch.device(\"cpu\")\n",
61
+ "#v_48_010=version with 48 edges 0.10A noise\n",
62
+ "model_name = \"v_48_020\" #@param [\"v_48_002\", \"v_48_010\", \"v_48_020\", \"v_48_030\"]\n",
63
+ "\n",
64
+ "\n",
65
+ "backbone_noise=0.00 # Standard deviation of Gaussian noise to add to backbone atoms\n",
66
+ "\n",
67
+ "path_to_model_weights='/content/ProteinMPNN/vanilla_model_weights' \n",
68
+ "hidden_dim = 128\n",
69
+ "num_layers = 3 \n",
70
+ "model_folder_path = path_to_model_weights\n",
71
+ "if model_folder_path[-1] != '/':\n",
72
+ " model_folder_path = model_folder_path + '/'\n",
73
+ "checkpoint_path = model_folder_path + f'{model_name}.pt'\n",
74
+ "\n",
75
+ "checkpoint = torch.load(checkpoint_path, map_location=device) \n",
76
+ "print('Number of edges:', checkpoint['num_edges'])\n",
77
+ "noise_level_print = checkpoint['noise_level']\n",
78
+ "print(f'Training noise level: {noise_level_print}A')\n",
79
+ "model = ProteinMPNN(num_letters=21, node_features=hidden_dim, edge_features=hidden_dim, hidden_dim=hidden_dim, num_encoder_layers=num_layers, num_decoder_layers=num_layers, augment_eps=backbone_noise, k_neighbors=checkpoint['num_edges'])\n",
80
+ "model.to(device)\n",
81
+ "model.load_state_dict(checkpoint['model_state_dict'])\n",
82
+ "model.eval()\n",
83
+ "print(\"Model loaded\")\n",
84
+ "\n",
85
+ "def make_tied_positions_for_homomers(pdb_dict_list):\n",
86
+ " my_dict = {}\n",
87
+ " for result in pdb_dict_list:\n",
88
+ " all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...\n",
89
+ " tied_positions_list = []\n",
90
+ " chain_length = len(result[f\"seq_chain_{all_chain_list[0]}\"])\n",
91
+ " for i in range(1,chain_length+1):\n",
92
+ " temp_dict = {}\n",
93
+ " for j, chain in enumerate(all_chain_list):\n",
94
+ " temp_dict[chain] = [i] #needs to be a list\n",
95
+ " tied_positions_list.append(temp_dict)\n",
96
+ " my_dict[result['name']] = tied_positions_list\n",
97
+ " return my_dict\n",
98
+ "\n",
99
+ "#########################\n",
100
+ "def get_pdb(pdb_code=\"\"):\n",
101
+ " if pdb_code is None or pdb_code == \"\":\n",
102
+ " upload_dict = files.upload()\n",
103
+ " pdb_string = upload_dict[list(upload_dict.keys())[0]]\n",
104
+ " with open(\"tmp.pdb\",\"wb\") as out: out.write(pdb_string)\n",
105
+ " return \"tmp.pdb\"\n",
106
+ " else:\n",
107
+ " os.system(f\"wget -qnc https://files.rcsb.org/view/{pdb_code}.pdb\")\n",
108
+ " return f\"{pdb_code}.pdb\""
109
+ ],
110
+ "metadata": {
111
+ "id": "2nKSlaMlSpcf",
112
+ "cellView": "form"
113
+ },
114
+ "execution_count": null,
115
+ "outputs": []
116
+ },
117
+ {
118
+ "cell_type": "code",
119
+ "execution_count": null,
120
+ "metadata": {
121
+ "cellView": "form",
122
+ "id": "xMVlYh8Fv2of"
123
+ },
124
+ "outputs": [],
125
+ "source": [
126
+ "#@title #Run ProteinMPNN\n",
127
+ "\n",
128
+ "#@markdown #### Input Options\n",
129
+ "pdb='6MRR' #@param {type:\"string\"}\n",
130
+ "pdb = pdb.replace(\" \",\"\")\n",
131
+ "pdb_path = get_pdb(pdb)\n",
132
+ "#@markdown - pdb code (leave blank to get an upload prompt)\n",
133
+ "\n",
134
+ "homomer = False #@param {type:\"boolean\"}\n",
135
+ "designed_chain = \"A\" #@param {type:\"string\"}\n",
136
+ "fixed_chain = \"\" #@param {type:\"string\"}\n",
137
+ "\n",
138
+ "if designed_chain == \"\":\n",
139
+ " designed_chain_list = []\n",
140
+ "else:\n",
141
+ " designed_chain_list = re.sub(\"[^A-Za-z]+\",\",\", designed_chain).split(\",\")\n",
142
+ "\n",
143
+ "if fixed_chain == \"\":\n",
144
+ " fixed_chain_list = []\n",
145
+ "else:\n",
146
+ " fixed_chain_list = re.sub(\"[^A-Za-z]+\",\",\", fixed_chain).split(\",\")\n",
147
+ "\n",
148
+ "chain_list = list(set(designed_chain_list + fixed_chain_list))\n",
149
+ "\n",
150
+ "#@markdown - specified which chain(s) to design and which chain(s) to keep fixed. \n",
151
+ "#@markdown Use comma:`A,B` to specifiy more than one chain\n",
152
+ "\n",
153
+ "#chain = \"A\" #@param {type:\"string\"}\n",
154
+ "#pdb_path_chains = chain\n",
155
+ "##@markdown - Define which chain to redesign\n",
156
+ "\n",
157
+ "#@markdown #### Design Options\n",
158
+ "num_seqs = 8 #@param [\"1\", \"2\", \"4\", \"8\", \"16\", \"32\", \"64\"] {type:\"raw\"}\n",
159
+ "num_seq_per_target = num_seqs\n",
160
+ "\n",
161
+ "#@markdown - Sampling temperature for amino acids, T=0.0 means taking argmax, T>>1.0 means sample randomly.\n",
162
+ "sampling_temp = \"0.1\" #@param [\"0.0001\", \"0.1\", \"0.15\", \"0.2\", \"0.25\", \"0.3\", \"0.5\"]\n",
163
+ "\n",
164
+ "\n",
165
+ "\n",
166
+ "save_score=0 # 0 for False, 1 for True; save score=-log_prob to npy files\n",
167
+ "save_probs=0 # 0 for False, 1 for True; save MPNN predicted probabilites per position\n",
168
+ "score_only=0 # 0 for False, 1 for True; score input backbone-sequence pairs\n",
169
+ "conditional_probs_only=0 # 0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)\n",
170
+ "conditional_probs_only_backbone=0 # 0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)\n",
171
+ " \n",
172
+ "batch_size=1 # Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory\n",
173
+ "max_length=20000 # Max sequence length\n",
174
+ " \n",
175
+ "out_folder='.' # Path to a folder to output sequences, e.g. /home/out/\n",
176
+ "jsonl_path='' # Path to a folder with parsed pdb into jsonl\n",
177
+ "omit_AAs='X' # Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine.\n",
178
+ " \n",
179
+ "pssm_multi=0.0 # A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions\n",
180
+ "pssm_threshold=0.0 # A value between -inf + inf to restric per position AAs\n",
181
+ "pssm_log_odds_flag=0 # 0 for False, 1 for True\n",
182
+ "pssm_bias_flag=0 # 0 for False, 1 for True\n",
183
+ "\n",
184
+ "\n",
185
+ "##############################################################\n",
186
+ "\n",
187
+ "folder_for_outputs = out_folder\n",
188
+ "\n",
189
+ "NUM_BATCHES = num_seq_per_target//batch_size\n",
190
+ "BATCH_COPIES = batch_size\n",
191
+ "temperatures = [float(item) for item in sampling_temp.split()]\n",
192
+ "omit_AAs_list = omit_AAs\n",
193
+ "alphabet = 'ACDEFGHIKLMNPQRSTVWYX'\n",
194
+ "\n",
195
+ "omit_AAs_np = np.array([AA in omit_AAs_list for AA in alphabet]).astype(np.float32)\n",
196
+ "\n",
197
+ "chain_id_dict = None\n",
198
+ "fixed_positions_dict = None\n",
199
+ "pssm_dict = None\n",
200
+ "omit_AA_dict = None\n",
201
+ "bias_AA_dict = None\n",
202
+ "tied_positions_dict = None\n",
203
+ "bias_by_res_dict = None\n",
204
+ "bias_AAs_np = np.zeros(len(alphabet))\n",
205
+ "\n",
206
+ "\n",
207
+ "###############################################################\n",
208
+ "pdb_dict_list = parse_PDB(pdb_path, input_chain_list=chain_list)\n",
209
+ "dataset_valid = StructureDatasetPDB(pdb_dict_list, truncate=None, max_length=max_length)\n",
210
+ "\n",
211
+ "chain_id_dict = {}\n",
212
+ "chain_id_dict[pdb_dict_list[0]['name']]= (designed_chain_list, fixed_chain_list)\n",
213
+ "\n",
214
+ "print(chain_id_dict)\n",
215
+ "for chain in chain_list:\n",
216
+ " l = len(pdb_dict_list[0][f\"seq_chain_{chain}\"])\n",
217
+ " print(f\"Length of chain {chain} is {l}\")\n",
218
+ "\n",
219
+ "if homomer:\n",
220
+ " tied_positions_dict = make_tied_positions_for_homomers(pdb_dict_list)\n",
221
+ "else:\n",
222
+ " tied_positions_dict = None\n",
223
+ "\n",
224
+ "#################################################################\n",
225
+ "sequences = []\n",
226
+ "with torch.no_grad():\n",
227
+ " print('Generating sequences...')\n",
228
+ " for ix, protein in enumerate(dataset_valid):\n",
229
+ " score_list = []\n",
230
+ " all_probs_list = []\n",
231
+ " all_log_probs_list = []\n",
232
+ " S_sample_list = []\n",
233
+ " batch_clones = [copy.deepcopy(protein) for i in range(BATCH_COPIES)]\n",
234
+ " X, S, mask, lengths, chain_M, chain_encoding_all, chain_list_list, visible_list_list, masked_list_list, masked_chain_length_list_list, chain_M_pos, omit_AA_mask, residue_idx, dihedral_mask, tied_pos_list_of_lists_list, pssm_coef, pssm_bias, pssm_log_odds_all, bias_by_res_all, tied_beta = tied_featurize(batch_clones, device, chain_id_dict, fixed_positions_dict, omit_AA_dict, tied_positions_dict, pssm_dict, bias_by_res_dict)\n",
235
+ " pssm_log_odds_mask = (pssm_log_odds_all > pssm_threshold).float() #1.0 for true, 0.0 for false\n",
236
+ " name_ = batch_clones[0]['name']\n",
237
+ "\n",
238
+ " randn_1 = torch.randn(chain_M.shape, device=X.device)\n",
239
+ " log_probs = model(X, S, mask, chain_M*chain_M_pos, residue_idx, chain_encoding_all, randn_1)\n",
240
+ " mask_for_loss = mask*chain_M*chain_M_pos\n",
241
+ " scores = _scores(S, log_probs, mask_for_loss)\n",
242
+ " native_score = scores.cpu().data.numpy()\n",
243
+ "\n",
244
+ " for temp in temperatures:\n",
245
+ " for j in range(NUM_BATCHES):\n",
246
+ " randn_2 = torch.randn(chain_M.shape, device=X.device)\n",
247
+ " if tied_positions_dict == None:\n",
248
+ " sample_dict = model.sample(X, randn_2, S, chain_M, chain_encoding_all, residue_idx, mask=mask, temperature=temp, omit_AAs_np=omit_AAs_np, bias_AAs_np=bias_AAs_np, chain_M_pos=chain_M_pos, omit_AA_mask=omit_AA_mask, pssm_coef=pssm_coef, pssm_bias=pssm_bias, pssm_multi=pssm_multi, pssm_log_odds_flag=bool(pssm_log_odds_flag), pssm_log_odds_mask=pssm_log_odds_mask, pssm_bias_flag=bool(pssm_bias_flag), bias_by_res=bias_by_res_all)\n",
249
+ " S_sample = sample_dict[\"S\"] \n",
250
+ " else:\n",
251
+ " sample_dict = model.tied_sample(X, randn_2, S, chain_M, chain_encoding_all, residue_idx, mask=mask, temperature=temp, omit_AAs_np=omit_AAs_np, bias_AAs_np=bias_AAs_np, chain_M_pos=chain_M_pos, omit_AA_mask=omit_AA_mask, pssm_coef=pssm_coef, pssm_bias=pssm_bias, pssm_multi=pssm_multi, pssm_log_odds_flag=bool(pssm_log_odds_flag), pssm_log_odds_mask=pssm_log_odds_mask, pssm_bias_flag=bool(pssm_bias_flag), tied_pos=tied_pos_list_of_lists_list[0], tied_beta=tied_beta, bias_by_res=bias_by_res_all)\n",
252
+ " # Compute scores\n",
253
+ " S_sample = sample_dict[\"S\"]\n",
254
+ " log_probs = model(X, S_sample, mask, chain_M*chain_M_pos, residue_idx, chain_encoding_all, randn_2, use_input_decoding_order=True, decoding_order=sample_dict[\"decoding_order\"])\n",
255
+ " mask_for_loss = mask*chain_M*chain_M_pos\n",
256
+ " scores = _scores(S_sample, log_probs, mask_for_loss)\n",
257
+ " scores = scores.cpu().data.numpy()\n",
258
+ " all_probs_list.append(sample_dict[\"probs\"].cpu().data.numpy())\n",
259
+ " all_log_probs_list.append(log_probs.cpu().data.numpy())\n",
260
+ " S_sample_list.append(S_sample.cpu().data.numpy())\n",
261
+ " for b_ix in range(BATCH_COPIES):\n",
262
+ " masked_chain_length_list = masked_chain_length_list_list[b_ix]\n",
263
+ " masked_list = masked_list_list[b_ix]\n",
264
+ " seq_recovery_rate = torch.sum(torch.sum(torch.nn.functional.one_hot(S[b_ix], 21)*torch.nn.functional.one_hot(S_sample[b_ix], 21),axis=-1)*mask_for_loss[b_ix])/torch.sum(mask_for_loss[b_ix])\n",
265
+ " seq = _S_to_seq(S_sample[b_ix], chain_M[b_ix])\n",
266
+ " score = scores[b_ix]\n",
267
+ " score_list.append(score)\n",
268
+ " native_seq = _S_to_seq(S[b_ix], chain_M[b_ix])\n",
269
+ " if b_ix == 0 and j==0 and temp==temperatures[0]:\n",
270
+ " start = 0\n",
271
+ " end = 0\n",
272
+ " list_of_AAs = []\n",
273
+ " for mask_l in masked_chain_length_list:\n",
274
+ " end += mask_l\n",
275
+ " list_of_AAs.append(native_seq[start:end])\n",
276
+ " start = end\n",
277
+ " native_seq = \"\".join(list(np.array(list_of_AAs)[np.argsort(masked_list)]))\n",
278
+ " l0 = 0\n",
279
+ " for mc_length in list(np.array(masked_chain_length_list)[np.argsort(masked_list)])[:-1]:\n",
280
+ " l0 += mc_length\n",
281
+ " native_seq = native_seq[:l0] + '/' + native_seq[l0:]\n",
282
+ " l0 += 1\n",
283
+ " sorted_masked_chain_letters = np.argsort(masked_list_list[0])\n",
284
+ " print_masked_chains = [masked_list_list[0][i] for i in sorted_masked_chain_letters]\n",
285
+ " sorted_visible_chain_letters = np.argsort(visible_list_list[0])\n",
286
+ " print_visible_chains = [visible_list_list[0][i] for i in sorted_visible_chain_letters]\n",
287
+ " native_score_print = np.format_float_positional(np.float32(native_score.mean()), unique=False, precision=4)\n",
288
+ " line = '>{}, score={}, fixed_chains={}, designed_chains={}, model_name={}\\n{}\\n'.format(name_, native_score_print, print_visible_chains, print_masked_chains, model_name, native_seq)\n",
289
+ " print(line.rstrip())\n",
290
+ " start = 0\n",
291
+ " end = 0\n",
292
+ " list_of_AAs = []\n",
293
+ " for mask_l in masked_chain_length_list:\n",
294
+ " end += mask_l\n",
295
+ " list_of_AAs.append(seq[start:end])\n",
296
+ " start = end\n",
297
+ "\n",
298
+ " seq = \"\".join(list(np.array(list_of_AAs)[np.argsort(masked_list)]))\n",
299
+ " l0 = 0\n",
300
+ " for mc_length in list(np.array(masked_chain_length_list)[np.argsort(masked_list)])[:-1]:\n",
301
+ " l0 += mc_length\n",
302
+ " seq = seq[:l0] + '/' + seq[l0:]\n",
303
+ " l0 += 1\n",
304
+ " score_print = np.format_float_positional(np.float32(score), unique=False, precision=4)\n",
305
+ " seq_rec_print = np.format_float_positional(np.float32(seq_recovery_rate.detach().cpu().numpy()), unique=False, precision=4)\n",
306
+ " line = '>T={}, sample={}, score={}, seq_recovery={}\\n{}\\n'.format(temp,b_ix,score_print,seq_rec_print,seq)\n",
307
+ " sequences.append(seq)\n",
308
+ " print(line.rstrip())\n",
309
+ "\n",
310
+ "\n",
311
+ "all_probs_concat = np.concatenate(all_probs_list)\n",
312
+ "all_log_probs_concat = np.concatenate(all_log_probs_list)\n",
313
+ "S_sample_concat = np.concatenate(S_sample_list)"
314
+ ]
315
+ },
316
+ {
317
+ "cell_type": "markdown",
318
+ "source": [
319
+ "# Predict with AlphaFold2 (with single-sequence input)"
320
+ ],
321
+ "metadata": {
322
+ "id": "5mQ4VLG1dPsd"
323
+ }
324
+ },
325
+ {
326
+ "cell_type": "code",
327
+ "source": [
328
+ "#@title Setup AlphaFold\n",
329
+ "\n",
330
+ "# import libraries\n",
331
+ "from IPython.utils import io\n",
332
+ "import os,sys,re\n",
333
+ "\n",
334
+ "if \"af_backprop\" not in sys.path:\n",
335
+ " import tensorflow as tf\n",
336
+ " import jax\n",
337
+ " import jax.numpy as jnp\n",
338
+ " import numpy as np\n",
339
+ " import matplotlib\n",
340
+ " from matplotlib import animation\n",
341
+ " import matplotlib.pyplot as plt\n",
342
+ " from IPython.display import HTML\n",
343
+ " import tqdm.notebook\n",
344
+ " TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'\n",
345
+ "\n",
346
+ " with io.capture_output() as captured:\n",
347
+ " # install ALPHAFOLD\n",
348
+ " if not os.path.isdir(\"af_backprop\"):\n",
349
+ " %shell git clone https://github.com/sokrypton/af_backprop.git\n",
350
+ " %shell pip -q install biopython dm-haiku ml-collections py3Dmol\n",
351
+ " %shell wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py\n",
352
+ " if not os.path.isdir(\"params\"):\n",
353
+ " %shell mkdir params\n",
354
+ " %shell curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params\n",
355
+ "\n",
356
+ " if not os.path.exists(\"MMalign\"):\n",
357
+ " # install MMalign\n",
358
+ " os.system(\"wget -qnc https://zhanggroup.org/MM-align/bin/module/MMalign.cpp\")\n",
359
+ " os.system(\"g++ -static -O3 -ffast-math -o MMalign MMalign.cpp\")\n",
360
+ "\n",
361
+ " def mmalign(pdb_a,pdb_b):\n",
362
+ " # pass to MMalign\n",
363
+ " output = os.popen(f'./MMalign {pdb_a} {pdb_b}')\n",
364
+ " # parse outputs\n",
365
+ " parse_float = lambda x: float(x.split(\"=\")[1].split()[0])\n",
366
+ " tms = []\n",
367
+ " for line in output:\n",
368
+ " line = line.rstrip()\n",
369
+ " if line.startswith(\"TM-score\"): tms.append(parse_float(line))\n",
370
+ " return tms\n",
371
+ "\n",
372
+ " # configure which device to use\n",
373
+ " try:\n",
374
+ " # check if TPU is available\n",
375
+ " import jax.tools.colab_tpu\n",
376
+ " jax.tools.colab_tpu.setup_tpu()\n",
377
+ " print('Running on TPU')\n",
378
+ " DEVICE = \"tpu\"\n",
379
+ " except:\n",
380
+ " if jax.local_devices()[0].platform == 'cpu':\n",
381
+ " print(\"WARNING: no GPU detected, will be using CPU\")\n",
382
+ " DEVICE = \"cpu\"\n",
383
+ " else:\n",
384
+ " print('Running on GPU')\n",
385
+ " DEVICE = \"gpu\"\n",
386
+ " # disable GPU on tensorflow\n",
387
+ " tf.config.set_visible_devices([], 'GPU')\n",
388
+ "\n",
389
+ " # import libraries\n",
390
+ " sys.path.append('af_backprop')\n",
391
+ " from utils import update_seq, update_aatype, get_plddt, get_pae\n",
392
+ " import colabfold as cf\n",
393
+ " from alphafold.common import protein as alphafold_protein\n",
394
+ " from alphafold.data import pipeline\n",
395
+ " from alphafold.model import data, config\n",
396
+ " from alphafold.common import residue_constants\n",
397
+ " from alphafold.model import model as alphafold_model\n",
398
+ "\n",
399
+ "# custom functions\n",
400
+ "def clear_mem():\n",
401
+ " backend = jax.lib.xla_bridge.get_backend()\n",
402
+ " for buf in backend.live_buffers(): buf.delete()\n",
403
+ "\n",
404
+ "def setup_model(max_len):\n",
405
+ " clear_mem()\n",
406
+ "\n",
407
+ " # setup model\n",
408
+ " cfg = config.model_config(\"model_3_ptm\")\n",
409
+ " cfg.model.num_recycle = 0\n",
410
+ " cfg.data.common.num_recycle = 0\n",
411
+ " cfg.data.eval.max_msa_clusters = 1\n",
412
+ " cfg.data.common.max_extra_msa = 1\n",
413
+ " cfg.data.eval.masked_msa_replace_fraction = 0\n",
414
+ " cfg.model.global_config.subbatch_size = None\n",
415
+ "\n",
416
+ " # get params\n",
417
+ " model_param = data.get_model_haiku_params(model_name=\"model_3_ptm\", data_dir=\".\")\n",
418
+ " model_runner = alphafold_model.RunModel(cfg, model_param, is_training=False, recycle_mode=\"none\")\n",
419
+ "\n",
420
+ " model_params = []\n",
421
+ " for k in [1,2,3,4,5]:\n",
422
+ " if k == 3:\n",
423
+ " model_params.append(model_param)\n",
424
+ " else:\n",
425
+ " params = data.get_model_haiku_params(model_name=f\"model_{k}_ptm\", data_dir=\".\")\n",
426
+ " model_params.append({k: params[k] for k in model_runner.params.keys()})\n",
427
+ "\n",
428
+ " seq = \"A\" * max_len\n",
429
+ " length = len(seq)\n",
430
+ " feature_dict = {\n",
431
+ " **pipeline.make_sequence_features(sequence=seq, description=\"none\", num_res=length),\n",
432
+ " **pipeline.make_msa_features(msas=[[seq]], deletion_matrices=[[[0]*length]])\n",
433
+ " }\n",
434
+ " inputs = model_runner.process_features(feature_dict,random_seed=0)\n",
435
+ "\n",
436
+ " def runner(I, params):\n",
437
+ " # update sequence\n",
438
+ " inputs = I[\"inputs\"]\n",
439
+ " inputs.update(I[\"prev\"])\n",
440
+ "\n",
441
+ " seq = jax.nn.one_hot(I[\"seq\"],20)\n",
442
+ " update_seq(seq, inputs)\n",
443
+ " update_aatype(inputs[\"target_feat\"][...,1:], inputs)\n",
444
+ "\n",
445
+ " # mask prediction\n",
446
+ " mask = jnp.arange(inputs[\"residue_index\"].shape[0]) < I[\"length\"]\n",
447
+ " inputs[\"seq_mask\"] = inputs[\"seq_mask\"].at[:].set(mask)\n",
448
+ " inputs[\"msa_mask\"] = inputs[\"msa_mask\"].at[:].set(mask)\n",
449
+ " inputs[\"residue_index\"] = jnp.where(mask, inputs[\"residue_index\"], 0)\n",
450
+ "\n",
451
+ " # get prediction\n",
452
+ " key = jax.random.PRNGKey(0)\n",
453
+ " outputs = model_runner.apply(params, key, inputs)\n",
454
+ "\n",
455
+ " prev = {\"init_msa_first_row\":outputs['representations']['msa_first_row'][None],\n",
456
+ " \"init_pair\":outputs['representations']['pair'][None],\n",
457
+ " \"init_pos\":outputs['structure_module']['final_atom_positions'][None]}\n",
458
+ " \n",
459
+ " aux = {\"final_atom_positions\":outputs[\"structure_module\"][\"final_atom_positions\"],\n",
460
+ " \"final_atom_mask\":outputs[\"structure_module\"][\"final_atom_mask\"],\n",
461
+ " \"plddt\":get_plddt(outputs),\"pae\":get_pae(outputs),\n",
462
+ " \"length\":I[\"length\"], \"seq\":I[\"seq\"], \"prev\":prev,\n",
463
+ " \"residue_idx\":inputs[\"residue_index\"][0]}\n",
464
+ " return aux\n",
465
+ "\n",
466
+ " return jax.jit(runner), model_params, {\"inputs\":inputs, \"length\":max_length}\n",
467
+ "\n",
468
+ "def save_pdb(outs, filename, Ls=None):\n",
469
+ " '''save pdb coordinates'''\n",
470
+ " p = {\"residue_index\":outs[\"residue_idx\"] + 1,\n",
471
+ " \"aatype\":outs[\"seq\"],\n",
472
+ " \"atom_positions\":outs[\"final_atom_positions\"],\n",
473
+ " \"atom_mask\":outs[\"final_atom_mask\"],\n",
474
+ " \"plddt\":outs[\"plddt\"]}\n",
475
+ " p = jax.tree_map(lambda x:x[:outs[\"length\"]], p)\n",
476
+ " b_factors = 100 * p.pop(\"plddt\")[:,None] * p[\"atom_mask\"]\n",
477
+ " p = alphafold_protein.Protein(**p,b_factors=b_factors)\n",
478
+ " pdb_lines = alphafold_protein.to_pdb(p)\n",
479
+ " with open(filename, 'w') as f:\n",
480
+ " f.write(pdb_lines)\n",
481
+ " if Ls is not None:\n",
482
+ " pdb_lines = cf.read_pdb_renum(filename, Ls)\n",
483
+ " with open(filename, 'w') as f:\n",
484
+ " f.write(pdb_lines)"
485
+ ],
486
+ "metadata": {
487
+ "cellView": "form",
488
+ "id": "4ZBUThXU7yY8"
489
+ },
490
+ "execution_count": null,
491
+ "outputs": []
492
+ },
493
+ {
494
+ "cell_type": "code",
495
+ "source": [
496
+ "#@title Run AlphaFold\n",
497
+ "num_models = 1 #@param [\"1\",\"2\",\"3\",\"4\",\"5\"] {type:\"raw\"}\n",
498
+ "num_recycles = 1 #@param [\"0\",\"1\",\"2\",\"3\"] {type:\"raw\"}\n",
499
+ "num_sequences = len(sequences)\n",
500
+ "outs = []\n",
501
+ "positions = []\n",
502
+ "plddts = []\n",
503
+ "paes = []\n",
504
+ "LS = []\n",
505
+ "\n",
506
+ "with tqdm.notebook.tqdm(total=(num_recycles + 1) * num_models * num_sequences, bar_format=TQDM_BAR_FORMAT) as pbar:\n",
507
+ " print(f\"seq_num model_num avg_pLDDT avg_pAE TMscore\")\n",
508
+ " for s,ori_sequence in enumerate(sequences):\n",
509
+ " Ls = [len(s) for s in ori_sequence.replace(\":\",\"/\").split(\"/\")]\n",
510
+ " LS.append(Ls)\n",
511
+ " sequence = re.sub(\"[^A-Z]\",\"\",ori_sequence)\n",
512
+ " length = len(sequence)\n",
513
+ "\n",
514
+ " # avoid recompiling if length within 25\n",
515
+ " if \"max_len\" not in dir() or length > max_len or (max_len - length) > 25:\n",
516
+ " max_len = length + 25\n",
517
+ " runner, params, I = setup_model(max_len)\n",
518
+ "\n",
519
+ " outs.append([])\n",
520
+ " positions.append([])\n",
521
+ " plddts.append([])\n",
522
+ " paes.append([])\n",
523
+ "\n",
524
+ " r = -1\n",
525
+ " # pad sequence to max length\n",
526
+ " seq = np.array([residue_constants.restype_order.get(aa,0) for aa in sequence])\n",
527
+ " seq = np.pad(seq,[0,max_len-length],constant_values=-1)\n",
528
+ " I[\"inputs\"]['residue_index'][:] = cf.chain_break(np.arange(max_len), Ls, length=32)\n",
529
+ " I.update({\"seq\":seq, \"length\":length})\n",
530
+ " \n",
531
+ " # for each model\n",
532
+ " for n in range(num_models):\n",
533
+ " # restart recycle\n",
534
+ " I[\"prev\"] = {'init_msa_first_row': np.zeros([1, max_len, 256]),\n",
535
+ " 'init_pair': np.zeros([1, max_len, max_len, 128]),\n",
536
+ " 'init_pos': np.zeros([1, max_len, 37, 3])}\n",
537
+ " for r in range(num_recycles + 1):\n",
538
+ " O = runner(I, params[n])\n",
539
+ " O = jax.tree_map(lambda x:np.asarray(x), O)\n",
540
+ " I[\"prev\"] = O[\"prev\"]\n",
541
+ " pbar.update(1)\n",
542
+ " \n",
543
+ " positions[-1].append(O[\"final_atom_positions\"][:length])\n",
544
+ " plddts[-1].append(O[\"plddt\"][:length])\n",
545
+ " paes[-1].append(O[\"pae\"][:length,:length])\n",
546
+ " outs[-1].append(O)\n",
547
+ " save_pdb(outs[-1][-1], f\"out_seq_{s}_model_{n}.pdb\", Ls=LS[-1])\n",
548
+ " tmscores = mmalign(pdb_path, f\"out_seq_{s}_model_{n}.pdb\")\n",
549
+ " print(f\"{s} {n}\\t{plddts[-1][-1].mean():.3}\\t{paes[-1][-1].mean():.3}\\t{tmscores[-1]:.3}\")"
550
+ ],
551
+ "metadata": {
552
+ "cellView": "form",
553
+ "id": "p2uNokqudTSH"
554
+ },
555
+ "execution_count": null,
556
+ "outputs": []
557
+ },
558
+ {
559
+ "cell_type": "code",
560
+ "source": [
561
+ "#@title Display 3D structure {run: \"auto\"}\n",
562
+ "#@markdown #### select which sequence to show (if more than one designed example)\n",
563
+ "seq_num = 0 #@param [\"0\",\"1\",\"2\",\"3\",\"4\",\"5\",\"6\",\"7\"] {type:\"raw\"}\n",
564
+ "assert seq_num < len(outs), f\"ERROR: seq_num ({seq_num}) exceeds number of designed sequences ({num_sequences})\"\n",
565
+ "model_num = 0 #@param [\"0\",\"1\",\"2\",\"3\",\"4\"] {type:\"raw\"}\n",
566
+ "assert model_num < len(outs[0]), f\"ERROR: model_num ({num_models}) exceeds number of model params used ({num_models})\"\n",
567
+ "#@markdown #### options\n",
568
+ "\n",
569
+ "color = \"confidence\" #@param [\"chain\", \"confidence\", \"rainbow\"]\n",
570
+ "if color == \"confidence\": color = \"lDDT\"\n",
571
+ "show_sidechains = False #@param {type:\"boolean\"}\n",
572
+ "show_mainchains = False #@param {type:\"boolean\"}\n",
573
+ "\n",
574
+ "v = cf.show_pdb(f\"out_seq_{seq_num}_model_{model_num}.pdb\", show_sidechains, show_mainchains, color,\n",
575
+ " color_HP=True, size=(800,480), Ls=LS[seq_num]) \n",
576
+ "v.setHoverable({}, True,\n",
577
+ " '''function(atom,viewer,event,container){if(!atom.label){atom.label=viewer.addLabel(\" \"+atom.resn+\":\"+atom.resi,{position:atom,backgroundColor:'mintcream',fontColor:'black'});}}''',\n",
578
+ " '''function(atom,viewer){if(atom.label){viewer.removeLabel(atom.label);delete atom.label;}}''')\n",
579
+ "v.show() \n",
580
+ "if color == \"lDDT\":\n",
581
+ " cf.plot_plddt_legend().show()\n",
582
+ "\n",
583
+ "# add confidence plots\n",
584
+ "cf.plot_confidence(plddts[seq_num][model_num]*100, paes[seq_num][model_num], Ls=LS[seq_num]).show()"
585
+ ],
586
+ "metadata": {
587
+ "cellView": "form",
588
+ "id": "0TNhcwok8d_w"
589
+ },
590
+ "execution_count": null,
591
+ "outputs": []
592
+ }
593
+ ],
594
+ "metadata": {
595
+ "colab": {
596
+ "name": "quickdemo_wAF2.ipynb",
597
+ "provenance": [],
598
+ "include_colab_link": true
599
+ },
600
+ "kernelspec": {
601
+ "display_name": "Python 3",
602
+ "name": "python3"
603
+ },
604
+ "language_info": {
605
+ "name": "python"
606
+ },
607
+ "accelerator": "GPU",
608
+ "gpuClass": "standard"
609
+ },
610
+ "nbformat": 4,
611
+ "nbformat_minor": 0
612
+ }
ProteinMPNN/examples/submit_example_1.sh ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 2
6
+ #SBATCH --output=example_1.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_monomers/pdbs/"
11
+
12
+ output_dir="../outputs/example_1_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
19
+
20
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
21
+
22
+ python ../protein_mpnn_run.py \
23
+ --jsonl_path $path_for_parsed_chains \
24
+ --out_folder $output_dir \
25
+ --num_seq_per_target 2 \
26
+ --sampling_temp "0.1" \
27
+ --seed 37 \
28
+ --batch_size 1
ProteinMPNN/examples/submit_example_2.sh ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 2
6
+ #SBATCH --output=example_2.out
7
+
8
+ source activate mlfold
9
+
10
+
11
+ folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
12
+
13
+ output_dir="../outputs/example_2_outputs"
14
+ if [ ! -d $output_dir ]
15
+ then
16
+ mkdir -p $output_dir
17
+ fi
18
+
19
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
20
+ path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
21
+ chains_to_design="A B"
22
+
23
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
24
+
25
+ python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
26
+
27
+ python ../protein_mpnn_run.py \
28
+ --jsonl_path $path_for_parsed_chains \
29
+ --chain_id_jsonl $path_for_assigned_chains \
30
+ --out_folder $output_dir \
31
+ --num_seq_per_target 2 \
32
+ --sampling_temp "0.1" \
33
+ --seed 37 \
34
+ --batch_size 1
ProteinMPNN/examples/submit_example_3.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_3.out
7
+
8
+ source activate mlfold
9
+
10
+ path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"
11
+
12
+ output_dir="../outputs/example_3_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+ chains_to_design="A B"
19
+
20
+ python ../protein_mpnn_run.py \
21
+ --pdb_path $path_to_PDB \
22
+ --pdb_path_chains "$chains_to_design" \
23
+ --out_folder $output_dir \
24
+ --num_seq_per_target 2 \
25
+ --sampling_temp "0.1" \
26
+ --seed 37 \
27
+ --batch_size 1
ProteinMPNN/examples/submit_example_3_score_only.sh ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_3.out
7
+
8
+ source activate mlfold
9
+
10
+ path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"
11
+
12
+ output_dir="../outputs/example_3_score_only_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+ chains_to_design="A B"
19
+
20
+ python ../protein_mpnn_run.py \
21
+ --pdb_path $path_to_PDB \
22
+ --pdb_path_chains "$chains_to_design" \
23
+ --out_folder $output_dir \
24
+ --num_seq_per_target 10 \
25
+ --sampling_temp "0.1" \
26
+ --score_only 1 \
27
+ --seed 37 \
28
+ --batch_size 1
ProteinMPNN/examples/submit_example_3_score_only_from_fasta.sh ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_3_from_fasta.out
7
+
8
+ source activate mlfold
9
+
10
+ path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"
11
+ path_to_fasta="/home/justas/projects/github/ProteinMPNN/outputs/example_3_outputs/seqs/3HTN.fa"
12
+
13
+ output_dir="../outputs/example_3_score_only_from_fasta_outputs"
14
+ if [ ! -d $output_dir ]
15
+ then
16
+ mkdir -p $output_dir
17
+ fi
18
+
19
+ chains_to_design="A B"
20
+
21
+ python ../protein_mpnn_run.py \
22
+ --path_to_fasta $path_to_fasta \
23
+ --pdb_path $path_to_PDB \
24
+ --pdb_path_chains "$chains_to_design" \
25
+ --out_folder $output_dir \
26
+ --num_seq_per_target 5 \
27
+ --sampling_temp "0.1" \
28
+ --score_only 1 \
29
+ --seed 13 \
30
+ --batch_size 1
ProteinMPNN/examples/submit_example_4.sh ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_4.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
11
+
12
+ output_dir="../outputs/example_4_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+
19
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
20
+ path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
21
+ path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
22
+ chains_to_design="A C"
23
+ #The first amino acid in the chain corresponds to 1 and not PDB residues index for now.
24
+ fixed_positions="1 2 3 4 5 6 7 8 23 25, 10 11 12 13 14 15 16 17 18 19 20 40" #fixing/not designing residues 1 2 3...25 in chain A and residues 10 11 12...40 in chain C
25
+
26
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
27
+
28
+ python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
29
+
30
+ python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$fixed_positions"
31
+
32
+ python ../protein_mpnn_run.py \
33
+ --jsonl_path $path_for_parsed_chains \
34
+ --chain_id_jsonl $path_for_assigned_chains \
35
+ --fixed_positions_jsonl $path_for_fixed_positions \
36
+ --out_folder $output_dir \
37
+ --num_seq_per_target 2 \
38
+ --sampling_temp "0.1" \
39
+ --seed 37 \
40
+ --batch_size 1
ProteinMPNN/examples/submit_example_4_non_fixed.sh ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_4_non_fixed.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
11
+
12
+ output_dir="../outputs/example_4_non_fixed_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+
19
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
20
+ path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
21
+ path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
22
+ chains_to_design="A C"
23
+ #The first amino acid in the chain corresponds to 1 and not PDB residues index for now.
24
+ design_only_positions="1 2 3 4 5 6 7 8 9 10, 3 4 5 6 7 8" #design only these residues; use flag --specify_non_fixed
25
+
26
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
27
+
28
+ python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
29
+
30
+ python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$design_only_positions" --specify_non_fixed
31
+
32
+ python ../protein_mpnn_run.py \
33
+ --jsonl_path $path_for_parsed_chains \
34
+ --chain_id_jsonl $path_for_assigned_chains \
35
+ --fixed_positions_jsonl $path_for_fixed_positions \
36
+ --out_folder $output_dir \
37
+ --num_seq_per_target 2 \
38
+ --sampling_temp "0.1" \
39
+ --seed 37 \
40
+ --batch_size 1
ProteinMPNN/examples/submit_example_5.sh ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_5.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
11
+
12
+ output_dir="../outputs/example_5_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+
19
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
20
+ path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
21
+ path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
22
+ path_for_tied_positions=$output_dir"/tied_pdbs.jsonl"
23
+ chains_to_design="A C"
24
+ fixed_positions="9 10 11 12 13 14 15 16 17 18 19 20 21 22 23, 10 11 18 19 20 22"
25
+ tied_positions="1 2 3 4 5 6 7 8, 1 2 3 4 5 6 7 8" #two list must match in length; residue 1 in chain A and C will be sampled togther;
26
+
27
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
28
+
29
+ python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
30
+
31
+ python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$fixed_positions"
32
+
33
+ python ../helper_scripts/make_tied_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_tied_positions --chain_list "$chains_to_design" --position_list "$tied_positions"
34
+
35
+ python ../protein_mpnn_run.py \
36
+ --jsonl_path $path_for_parsed_chains \
37
+ --chain_id_jsonl $path_for_assigned_chains \
38
+ --fixed_positions_jsonl $path_for_fixed_positions \
39
+ --tied_positions_jsonl $path_for_tied_positions \
40
+ --out_folder $output_dir \
41
+ --num_seq_per_target 2 \
42
+ --sampling_temp "0.1" \
43
+ --seed 37 \
44
+ --batch_size 1
ProteinMPNN/examples/submit_example_6.sh ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 3
6
+ #SBATCH --output=example_6.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_homooligomers/pdbs/"
11
+
12
+ output_dir="../outputs/example_6_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+
19
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
20
+ path_for_tied_positions=$output_dir"/tied_pdbs.jsonl"
21
+ path_for_designed_sequences=$output_dir"/temp_0.1"
22
+
23
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
24
+
25
+ python ../helper_scripts/make_tied_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_tied_positions --homooligomer 1
26
+
27
+ python ../protein_mpnn_run.py \
28
+ --jsonl_path $path_for_parsed_chains \
29
+ --tied_positions_jsonl $path_for_tied_positions \
30
+ --out_folder $output_dir \
31
+ --num_seq_per_target 2 \
32
+ --sampling_temp "0.2" \
33
+ --seed 37 \
34
+ --batch_size 1
ProteinMPNN/examples/submit_example_7.sh ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 2
6
+ #SBATCH --output=example_7.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_monomers/pdbs/"
11
+
12
+ output_dir="../outputs/example_7_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
19
+
20
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
21
+
22
+ python ../protein_mpnn_run.py \
23
+ --jsonl_path $path_for_parsed_chains \
24
+ --out_folder $output_dir \
25
+ --num_seq_per_target 1 \
26
+ --sampling_temp "0.1" \
27
+ --unconditional_probs_only 1 \
28
+ --seed 37 \
29
+ --batch_size 1
ProteinMPNN/examples/submit_example_8.sh ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 2
6
+ #SBATCH --output=example_8.out
7
+
8
+ source activate mlfold
9
+
10
+ folder_with_pdbs="../inputs/PDB_monomers/pdbs/"
11
+
12
+ output_dir="../outputs/example_8_outputs"
13
+ if [ ! -d $output_dir ]
14
+ then
15
+ mkdir -p $output_dir
16
+ fi
17
+
18
+ path_for_bias=$output_dir"/bias_pdbs.jsonl"
19
+ #Adding global polar amino acid bias (Doug Tischer)
20
+ AA_list="D E H K N Q R S T W Y"
21
+ bias_list="1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39"
22
+ python ../helper_scripts/make_bias_AA.py --output_path=$path_for_bias --AA_list="$AA_list" --bias_list="$bias_list"
23
+
24
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
25
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
26
+
27
+ python ../protein_mpnn_run.py \
28
+ --jsonl_path $path_for_parsed_chains \
29
+ --out_folder $output_dir \
30
+ --bias_AA_jsonl $path_for_bias \
31
+ --num_seq_per_target 2 \
32
+ --sampling_temp "0.1" \
33
+ --seed 37 \
34
+ --batch_size 1
ProteinMPNN/examples/submit_example_pssm.sh ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -p gpu
3
+ #SBATCH --mem=32g
4
+ #SBATCH --gres=gpu:rtx2080:1
5
+ #SBATCH -c 2
6
+ #SBATCH --output=example_2.out
7
+
8
+ source activate mlfold
9
+
10
+
11
+ #new_probabilities_using_PSSM = (1-pssm_multi*pssm_coef_gathered[:,None])*probs + pssm_multi*pssm_coef_gathered[:,None]*pssm_bias_gathered
12
+ #probs - predictions from MPNN
13
+ #pssm_bias_gathered - input PSSM bias (needs to be a probability distribution)
14
+ #pssm_multi - a number between 0.0 (no bias) and 1.0 (no MPNN) inputed via flag --pssm_multi; this is a global number equally applied to all the residues
15
+ #pssm_coef_gathered - a number between 0.0 (no bias) and 1.0 (no MPNN) inputed via ../helper_scripts/make_pssm_input_dict.py can be adjusted per residue level; i.e only apply PSSM bias to specific residues; or chains
16
+
17
+
18
+
19
+ pssm_input_path="../inputs/PSSM_inputs"
20
+ folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
21
+
22
+ output_dir="../outputs/example_pssm_outputs"
23
+ if [ ! -d $output_dir ]
24
+ then
25
+ mkdir -p $output_dir
26
+ fi
27
+
28
+ path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
29
+ path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
30
+ pssm=$output_dir"/pssm.jsonl"
31
+ chains_to_design="A B"
32
+
33
+ python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
34
+
35
+ python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
36
+
37
+ python ../helper_scripts/make_pssm_input_dict.py --jsonl_input_path=$path_for_parsed_chains --PSSM_input_path=$pssm_input_path --output_path=$pssm
38
+
39
+ python ../protein_mpnn_run.py \
40
+ --jsonl_path $path_for_parsed_chains \
41
+ --chain_id_jsonl $path_for_assigned_chains \
42
+ --out_folder $output_dir \
43
+ --num_seq_per_target 2 \
44
+ --sampling_temp "0.1" \
45
+ --seed 37 \
46
+ --batch_size 1 \
47
+ --pssm_jsonl $pssm \
48
+ --pssm_multi 0.3 \
49
+ --pssm_bias_flag 1
ProteinMPNN/helper_scripts/assign_fixed_chains.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+ import json
5
+
6
+ with open(args.input_path, 'r') as json_file:
7
+ json_list = list(json_file)
8
+
9
+ global_designed_chain_list = []
10
+ if args.chain_list != '':
11
+ global_designed_chain_list = [str(item) for item in args.chain_list.split()]
12
+ my_dict = {}
13
+ for json_str in json_list:
14
+ result = json.loads(json_str)
15
+ all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain'] #['A','B', 'C',...]
16
+ if len(global_designed_chain_list) > 0:
17
+ designed_chain_list = global_designed_chain_list
18
+ else:
19
+ #manually specify, e.g.
20
+ designed_chain_list = ["A"]
21
+ fixed_chain_list = [letter for letter in all_chain_list if letter not in designed_chain_list] #fix/do not redesign these chains
22
+ my_dict[result['name']]= (designed_chain_list, fixed_chain_list)
23
+
24
+ with open(args.output_path, 'w') as f:
25
+ f.write(json.dumps(my_dict) + '\n')
26
+
27
+
28
+ if __name__ == "__main__":
29
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
30
+ argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
31
+ argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
32
+ argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be designed")
33
+
34
+ args = argparser.parse_args()
35
+ main(args)
36
+
37
+ # Output looks like this:
38
+ # {"5TTA": [["A"], ["B"]], "3LIS": [["A"], ["B"]]}
39
+
ProteinMPNN/helper_scripts/make_bias_AA.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+
5
+ import numpy as np
6
+ import json
7
+
8
+ bias_list = [float(item) for item in args.bias_list.split()]
9
+ AA_list = [str(item) for item in args.AA_list.split()]
10
+
11
+ my_dict = dict(zip(AA_list, bias_list))
12
+
13
+ with open(args.output_path, 'w') as f:
14
+ f.write(json.dumps(my_dict) + '\n')
15
+
16
+
17
+ if __name__ == "__main__":
18
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
19
+ argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
20
+ argparser.add_argument("--AA_list", type=str, default='', help="List of AAs to be biased")
21
+ argparser.add_argument("--bias_list", type=str, default='', help="AA bias strengths")
22
+
23
+ args = argparser.parse_args()
24
+ main(args)
25
+
26
+ #e.g. output
27
+ #{"A": -0.01, "G": 0.02}
ProteinMPNN/helper_scripts/make_bias_per_res_dict.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+ import glob
5
+ import random
6
+ import numpy as np
7
+ import json
8
+
9
+ mpnn_alphabet = 'ACDEFGHIKLMNPQRSTVWYX'
10
+
11
+ mpnn_alphabet_dict = {'A': 0,'C': 1,'D': 2,'E': 3,'F': 4,'G': 5,'H': 6,'I': 7,'K': 8,'L': 9,'M': 10,'N': 11,'P': 12,'Q': 13,'R': 14,'S': 15,'T': 16,'V': 17,'W': 18,'Y': 19,'X': 20}
12
+
13
+ with open(args.input_path, 'r') as json_file:
14
+ json_list = list(json_file)
15
+
16
+ my_dict = {}
17
+ for json_str in json_list:
18
+ result = json.loads(json_str)
19
+ all_chain_list = [item[-1:] for item in list(result) if item[:10]=='seq_chain_']
20
+ bias_by_res_dict = {}
21
+ for chain in all_chain_list:
22
+ chain_length = len(result[f'seq_chain_{chain}'])
23
+ bias_per_residue = np.zeros([chain_length, 21])
24
+
25
+
26
+ if chain == 'A':
27
+ residues = [0, 1, 2, 3, 4, 5, 11, 12, 13, 14, 15]
28
+ amino_acids = [5, 9] #[G, L]
29
+ for res in residues:
30
+ for aa in amino_acids:
31
+ bias_per_residue[res, aa] = 100.5
32
+
33
+ if chain == 'C':
34
+ residues = [0, 1, 2, 3, 4, 5, 11, 12, 13, 14, 15]
35
+ amino_acids = range(21)[1:] #[G, L]
36
+ for res in residues:
37
+ for aa in amino_acids:
38
+ bias_per_residue[res, aa] = -100.5
39
+
40
+ bias_by_res_dict[chain] = bias_per_residue.tolist()
41
+ my_dict[result['name']] = bias_by_res_dict
42
+
43
+ with open(args.output_path, 'w') as f:
44
+ f.write(json.dumps(my_dict) + '\n')
45
+
46
+
47
+ if __name__ == "__main__":
48
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
49
+ argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
50
+ argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
51
+
52
+ args = argparser.parse_args()
53
+ main(args)
ProteinMPNN/helper_scripts/make_fixed_positions_dict.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+ import glob
5
+ import random
6
+ import numpy as np
7
+ import json
8
+ import itertools
9
+
10
+ with open(args.input_path, 'r') as json_file:
11
+ json_list = list(json_file)
12
+
13
+ fixed_list = [[int(item) for item in one.split()] for one in args.position_list.split(",")]
14
+ global_designed_chain_list = [str(item) for item in args.chain_list.split()]
15
+ my_dict = {}
16
+
17
+ if not args.specify_non_fixed:
18
+ for json_str in json_list:
19
+ result = json.loads(json_str)
20
+ all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
21
+ fixed_position_dict = {}
22
+ for i, chain in enumerate(global_designed_chain_list):
23
+ fixed_position_dict[chain] = fixed_list[i]
24
+ for chain in all_chain_list:
25
+ if chain not in global_designed_chain_list:
26
+ fixed_position_dict[chain] = []
27
+ my_dict[result['name']] = fixed_position_dict
28
+ else:
29
+ for json_str in json_list:
30
+ result = json.loads(json_str)
31
+ all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
32
+ fixed_position_dict = {}
33
+ for chain in all_chain_list:
34
+ seq_length = len(result[f'seq_chain_{chain}'])
35
+ all_residue_list = (np.arange(seq_length)+1).tolist()
36
+ if chain not in global_designed_chain_list:
37
+ fixed_position_dict[chain] = all_residue_list
38
+ else:
39
+ idx = np.argwhere(np.array(global_designed_chain_list) == chain)[0][0]
40
+ fixed_position_dict[chain] = list(set(all_residue_list)-set(fixed_list[idx]))
41
+ my_dict[result['name']] = fixed_position_dict
42
+
43
+ with open(args.output_path, 'w') as f:
44
+ f.write(json.dumps(my_dict) + '\n')
45
+
46
+ #e.g. output
47
+ #{"5TTA": {"A": [1, 2, 3, 7, 8, 9, 22, 25, 33], "B": []}, "3LIS": {"A": [], "B": []}}
48
+
49
+ if __name__ == "__main__":
50
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
51
+ argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
52
+ argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
53
+ argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be fixed")
54
+ argparser.add_argument("--position_list", type=str, default='', help="Position lists, e.g. 11 12 14 18, 1 2 3 4 for first chain and the second chain")
55
+ argparser.add_argument("--specify_non_fixed", action="store_true", default=False, help="Allows specifying just residues that need to be designed (default: false)")
56
+
57
+ args = argparser.parse_args()
58
+ main(args)
59
+
ProteinMPNN/helper_scripts/make_pos_neg_tied_positions_dict.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+
5
+ import glob
6
+ import random
7
+ import numpy as np
8
+ import json
9
+ import itertools
10
+
11
+ with open(args.input_path, 'r') as json_file:
12
+ json_list = list(json_file)
13
+
14
+ homooligomeric_state = args.homooligomer
15
+
16
+ if homooligomeric_state == 0:
17
+ tied_list = [[int(item) for item in one.split()] for one in args.position_list.split(",")]
18
+ global_designed_chain_list = [str(item) for item in args.chain_list.split()]
19
+ my_dict = {}
20
+ for json_str in json_list:
21
+ result = json.loads(json_str)
22
+ all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
23
+ tied_positions_list = []
24
+ for i, pos in enumerate(tied_list[0]):
25
+ temp_dict = {}
26
+ for j, chain in enumerate(global_designed_chain_list):
27
+ temp_dict[chain] = [tied_list[j][i]] #needs to be a list
28
+ tied_positions_list.append(temp_dict)
29
+ my_dict[result['name']] = tied_positions_list
30
+ else:
31
+ if args.pos_neg_chain_list:
32
+ chain_list_input = [[str(item) for item in one.split()] for one in args.pos_neg_chain_list.split(",")]
33
+ chain_betas_input = [[float(item) for item in one.split()] for one in args.pos_neg_chain_betas.split(",")]
34
+ chain_list_flat = [item for sublist in chain_list_input for item in sublist]
35
+ chain_betas_flat = [item for sublist in chain_betas_input for item in sublist]
36
+ chain_betas_dict = dict(zip(chain_list_flat, chain_betas_flat))
37
+ my_dict = {}
38
+ for json_str in json_list:
39
+ result = json.loads(json_str)
40
+ all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
41
+ tied_positions_list = []
42
+ chain_length = len(result[f"seq_chain_{all_chain_list[0]}"])
43
+ for chains in chain_list_input:
44
+ for i in range(1,chain_length+1):
45
+ temp_dict = {}
46
+ for j, chain in enumerate(chains):
47
+ if args.pos_neg_chain_list and chain in chain_list_flat:
48
+ temp_dict[chain] = [[i], [chain_betas_dict[chain]]]
49
+ else:
50
+ temp_dict[chain] = [[i], [1.0]] #first list is for residue numbers, second list is for weights for the energy, +ive and -ive design
51
+ tied_positions_list.append(temp_dict)
52
+ my_dict[result['name']] = tied_positions_list
53
+
54
+ with open(args.output_path, 'w') as f:
55
+ f.write(json.dumps(my_dict) + '\n')
56
+
57
+ if __name__ == "__main__":
58
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
59
+ argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
60
+ argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
61
+ argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be fixed")
62
+ argparser.add_argument("--position_list", type=str, default='', help="Position lists, e.g. 11 12 14 18, 1 2 3 4 for first chain and the second chain")
63
+ argparser.add_argument("--homooligomer", type=int, default=0, help="If 0 do not use, if 1 then design homooligomer")
64
+ argparser.add_argument("--pos_neg_chain_list", type=str, default='', help="Chain lists to be tied together")
65
+ argparser.add_argument("--pos_neg_chain_betas", type=str, default='', help="Chain beta list for the chain lists provided; 1.0 for the positive design, -0.1 or -0.5 for negative, 0.0 means do not use that chain info")
66
+
67
+ args = argparser.parse_args()
68
+ main(args)
69
+
70
+
71
+ #e.g. output
72
+ #{"5TTA": [], "3LIS": [{"A": [1], "B": [1]}, {"A": [2], "B": [2]}, {"A": [3], "B": [3]}, {"A": [4], "B": [4]}, {"A": [5], "B": [5]}, {"A": [6], "B": [6]}, {"A": [7], "B": [7]}, {"A": [8], "B": [8]}, {"A": [9], "B": [9]}, {"A": [10], "B": [10]}, {"A": [11], "B": [11]}, {"A": [12], "B": [12]}, {"A": [13], "B": [13]}, {"A": [14], "B": [14]}, {"A": [15], "B": [15]}, {"A": [16], "B": [16]}, {"A": [17], "B": [17]}, {"A": [18], "B": [18]}, {"A": [19], "B": [19]}, {"A": [20], "B": [20]}, {"A": [21], "B": [21]}, {"A": [22], "B": [22]}, {"A": [23], "B": [23]}, {"A": [24], "B": [24]}, {"A": [25], "B": [25]}, {"A": [26], "B": [26]}, {"A": [27], "B": [27]}, {"A": [28], "B": [28]}, {"A": [29], "B": [29]}, {"A": [30], "B": [30]}, {"A": [31], "B": [31]}, {"A": [32], "B": [32]}, {"A": [33], "B": [33]}, {"A": [34], "B": [34]}, {"A": [35], "B": [35]}, {"A": [36], "B": [36]}, {"A": [37], "B": [37]}, {"A": [38], "B": [38]}, {"A": [39], "B": [39]}, {"A": [40], "B": [40]}, {"A": [41], "B": [41]}, {"A": [42], "B": [42]}, {"A": [43], "B": [43]}, {"A": [44], "B": [44]}, {"A": [45], "B": [45]}, {"A": [46], "B": [46]}, {"A": [47], "B": [47]}, {"A": [48], "B": [48]}, {"A": [49], "B": [49]}, {"A": [50], "B": [50]}, {"A": [51], "B": [51]}, {"A": [52], "B": [52]}, {"A": [53], "B": [53]}, {"A": [54], "B": [54]}, {"A": [55], "B": [55]}, {"A": [56], "B": [56]}, {"A": [57], "B": [57]}, {"A": [58], "B": [58]}, {"A": [59], "B": [59]}, {"A": [60], "B": [60]}, {"A": [61], "B": [61]}, {"A": [62], "B": [62]}, {"A": [63], "B": [63]}, {"A": [64], "B": [64]}, {"A": [65], "B": [65]}, {"A": [66], "B": [66]}, {"A": [67], "B": [67]}, {"A": [68], "B": [68]}, {"A": [69], "B": [69]}, {"A": [70], "B": [70]}, {"A": [71], "B": [71]}, {"A": [72], "B": [72]}, {"A": [73], "B": [73]}, {"A": [74], "B": [74]}, {"A": [75], "B": [75]}, {"A": [76], "B": [76]}, {"A": [77], "B": [77]}, {"A": [78], "B": [78]}, {"A": [79], "B": [79]}, {"A": [80], "B": [80]}, {"A": [81], "B": [81]}, {"A": [82], "B": [82]}, {"A": [83], "B": [83]}, {"A": [84], "B": [84]}, {"A": [85], "B": [85]}, {"A": [86], "B": [86]}, {"A": [87], "B": [87]}, {"A": [88], "B": [88]}, {"A": [89], "B": [89]}, {"A": [90], "B": [90]}, {"A": [91], "B": [91]}, {"A": [92], "B": [92]}, {"A": [93], "B": [93]}, {"A": [94], "B": [94]}, {"A": [95], "B": [95]}, {"A": [96], "B": [96]}]}
73
+
ProteinMPNN/helper_scripts/make_pssm_input_dict.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+ import json
5
+ import numpy as np
6
+ with open(args.jsonl_input_path, 'r') as json_file:
7
+ json_list = list(json_file)
8
+
9
+ my_dict = {}
10
+ for json_str in json_list:
11
+ result = json.loads(json_str)
12
+ all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
13
+ path_to_PSSM = args.PSSM_input_path+"/"+result['name'] + ".npz"
14
+ print(path_to_PSSM)
15
+ pssm_input = np.load(path_to_PSSM)
16
+ pssm_dict = {}
17
+ for chain in all_chain_list:
18
+ pssm_dict[chain] = {}
19
+ pssm_dict[chain]['pssm_coef'] = pssm_input[chain+'_coef'].tolist() #[L] per position coefficient to trust PSSM; 0.0 - do not use it; 1.0 - just use PSSM only
20
+ pssm_dict[chain]['pssm_bias'] = pssm_input[chain+'_bias'].tolist() #[L,21] probability (sums up to 1.0 over alphabet of size 21) from PSSM
21
+ pssm_dict[chain]['pssm_log_odds'] = pssm_input[chain+'_odds'].tolist() #[L,21] log_odds ratios coming from PSSM; optional/not needed
22
+ my_dict[result['name']] = pssm_dict
23
+
24
+ #Write output to:
25
+ with open(args.output_path, 'w') as f:
26
+ f.write(json.dumps(my_dict) + '\n')
27
+
28
+ if __name__ == "__main__":
29
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
30
+
31
+ argparser.add_argument("--PSSM_input_path", type=str, help="Path to PSSMs saved as npz files.")
32
+ argparser.add_argument("--jsonl_input_path", type=str, help="Path where to load .jsonl dictionary of parsed pdbs.")
33
+ argparser.add_argument("--output_path", type=str, help="Path where to save .jsonl dictionary with PSSM bias.")
34
+
35
+ args = argparser.parse_args()
36
+ main(args)
ProteinMPNN/helper_scripts/make_tied_positions_dict.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+
5
+ import glob
6
+ import random
7
+ import numpy as np
8
+ import json
9
+ import itertools
10
+
11
+ with open(args.input_path, 'r') as json_file:
12
+ json_list = list(json_file)
13
+
14
+ homooligomeric_state = args.homooligomer
15
+
16
+ if homooligomeric_state == 0:
17
+ tied_list = [[int(item) for item in one.split()] for one in args.position_list.split(",")]
18
+ global_designed_chain_list = [str(item) for item in args.chain_list.split()]
19
+ my_dict = {}
20
+ for json_str in json_list:
21
+ result = json.loads(json_str)
22
+ all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
23
+ tied_positions_list = []
24
+ for i, pos in enumerate(tied_list[0]):
25
+ temp_dict = {}
26
+ for j, chain in enumerate(global_designed_chain_list):
27
+ temp_dict[chain] = [tied_list[j][i]] #needs to be a list
28
+ tied_positions_list.append(temp_dict)
29
+ my_dict[result['name']] = tied_positions_list
30
+ else:
31
+ my_dict = {}
32
+ for json_str in json_list:
33
+ result = json.loads(json_str)
34
+ all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
35
+ tied_positions_list = []
36
+ chain_length = len(result[f"seq_chain_{all_chain_list[0]}"])
37
+ for i in range(1,chain_length+1):
38
+ temp_dict = {}
39
+ for j, chain in enumerate(all_chain_list):
40
+ temp_dict[chain] = [i] #needs to be a list
41
+ tied_positions_list.append(temp_dict)
42
+ my_dict[result['name']] = tied_positions_list
43
+
44
+ with open(args.output_path, 'w') as f:
45
+ f.write(json.dumps(my_dict) + '\n')
46
+
47
+ if __name__ == "__main__":
48
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
49
+ argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
50
+ argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
51
+ argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be fixed")
52
+ argparser.add_argument("--position_list", type=str, default='', help="Position lists, e.g. 11 12 14 18, 1 2 3 4 for first chain and the second chain")
53
+ argparser.add_argument("--homooligomer", type=int, default=0, help="If 0 do not use, if 1 then design homooligomer")
54
+
55
+ args = argparser.parse_args()
56
+ main(args)
57
+
58
+
59
+ #e.g. output
60
+ #{"5TTA": [], "3LIS": [{"A": [1], "B": [1]}, {"A": [2], "B": [2]}, {"A": [3], "B": [3]}, {"A": [4], "B": [4]}, {"A": [5], "B": [5]}, {"A": [6], "B": [6]}, {"A": [7], "B": [7]}, {"A": [8], "B": [8]}, {"A": [9], "B": [9]}, {"A": [10], "B": [10]}, {"A": [11], "B": [11]}, {"A": [12], "B": [12]}, {"A": [13], "B": [13]}, {"A": [14], "B": [14]}, {"A": [15], "B": [15]}, {"A": [16], "B": [16]}, {"A": [17], "B": [17]}, {"A": [18], "B": [18]}, {"A": [19], "B": [19]}, {"A": [20], "B": [20]}, {"A": [21], "B": [21]}, {"A": [22], "B": [22]}, {"A": [23], "B": [23]}, {"A": [24], "B": [24]}, {"A": [25], "B": [25]}, {"A": [26], "B": [26]}, {"A": [27], "B": [27]}, {"A": [28], "B": [28]}, {"A": [29], "B": [29]}, {"A": [30], "B": [30]}, {"A": [31], "B": [31]}, {"A": [32], "B": [32]}, {"A": [33], "B": [33]}, {"A": [34], "B": [34]}, {"A": [35], "B": [35]}, {"A": [36], "B": [36]}, {"A": [37], "B": [37]}, {"A": [38], "B": [38]}, {"A": [39], "B": [39]}, {"A": [40], "B": [40]}, {"A": [41], "B": [41]}, {"A": [42], "B": [42]}, {"A": [43], "B": [43]}, {"A": [44], "B": [44]}, {"A": [45], "B": [45]}, {"A": [46], "B": [46]}, {"A": [47], "B": [47]}, {"A": [48], "B": [48]}, {"A": [49], "B": [49]}, {"A": [50], "B": [50]}, {"A": [51], "B": [51]}, {"A": [52], "B": [52]}, {"A": [53], "B": [53]}, {"A": [54], "B": [54]}, {"A": [55], "B": [55]}, {"A": [56], "B": [56]}, {"A": [57], "B": [57]}, {"A": [58], "B": [58]}, {"A": [59], "B": [59]}, {"A": [60], "B": [60]}, {"A": [61], "B": [61]}, {"A": [62], "B": [62]}, {"A": [63], "B": [63]}, {"A": [64], "B": [64]}, {"A": [65], "B": [65]}, {"A": [66], "B": [66]}, {"A": [67], "B": [67]}, {"A": [68], "B": [68]}, {"A": [69], "B": [69]}, {"A": [70], "B": [70]}, {"A": [71], "B": [71]}, {"A": [72], "B": [72]}, {"A": [73], "B": [73]}, {"A": [74], "B": [74]}, {"A": [75], "B": [75]}, {"A": [76], "B": [76]}, {"A": [77], "B": [77]}, {"A": [78], "B": [78]}, {"A": [79], "B": [79]}, {"A": [80], "B": [80]}, {"A": [81], "B": [81]}, {"A": [82], "B": [82]}, {"A": [83], "B": [83]}, {"A": [84], "B": [84]}, {"A": [85], "B": [85]}, {"A": [86], "B": [86]}, {"A": [87], "B": [87]}, {"A": [88], "B": [88]}, {"A": [89], "B": [89]}, {"A": [90], "B": [90]}, {"A": [91], "B": [91]}, {"A": [92], "B": [92]}, {"A": [93], "B": [93]}, {"A": [94], "B": [94]}, {"A": [95], "B": [95]}, {"A": [96], "B": [96]}]}
61
+
ProteinMPNN/helper_scripts/other_tools/make_omit_AA.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import glob
2
+ import random
3
+ import numpy as np
4
+ import json
5
+ import itertools
6
+
7
+ #MODIFY this path
8
+ with open('/home/justas/projects/lab_github/mpnn/data/pdbs.jsonl', 'r') as json_file:
9
+ json_list = list(json_file)
10
+
11
+ my_dict = {}
12
+ for json_str in json_list:
13
+ result = json.loads(json_str)
14
+ all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
15
+ fixed_position_dict = {}
16
+ print(result['name'])
17
+ if result['name'] == '5TTA':
18
+ for chain in all_chain_list:
19
+ if chain == 'A':
20
+ fixed_position_dict[chain] = [
21
+ [[int(item) for item in list(itertools.chain(list(np.arange(1,4)), list(np.arange(7,10)), [22, 25, 33]))], 'GPL'],
22
+ [[int(item) for item in list(itertools.chain([40, 41, 42, 43]))], 'WC'],
23
+ [[int(item) for item in list(itertools.chain(list(np.arange(50,150))))], 'ACEFGHIKLMNRSTVWYX'],
24
+ [[int(item) for item in list(itertools.chain(list(np.arange(160,200))))], 'FGHIKLPQDMNRSTVWYX']]
25
+ else:
26
+ fixed_position_dict[chain] = []
27
+ else:
28
+ for chain in all_chain_list:
29
+ fixed_position_dict[chain] = []
30
+ my_dict[result['name']] = fixed_position_dict
31
+
32
+ #MODIFY this path
33
+ with open('/home/justas/projects/lab_github/mpnn/data/omit_AA.jsonl', 'w') as f:
34
+ f.write(json.dumps(my_dict) + '\n')
35
+
36
+
37
+ print('Finished')
38
+ #e.g. output
39
+ #{"5TTA": {"A": [[[1, 2, 3, 7, 8, 9, 22, 25, 33], "GPL"], [[40, 41, 42, 43], "WC"], [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149], "ACEFGHIKLMNRSTVWYX"], [[160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199], "FGHIKLPQDMNRSTVWYX"]], "B": []}, "3LIS": {"A": [], "B": []}}
ProteinMPNN/helper_scripts/other_tools/make_pssm_dict.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+
4
+ import glob
5
+ import random
6
+ import numpy as np
7
+ import json
8
+
9
+
10
+ def softmax(x, T):
11
+ return np.exp(x/T)/np.sum(np.exp(x/T), -1, keepdims=True)
12
+
13
+ def parse_pssm(path):
14
+ data = pd.read_csv(path, skiprows=2)
15
+ floats_list_list = []
16
+ for i in range(data.values.shape[0]):
17
+ str1 = data.values[i][0][4:]
18
+ floats_list = []
19
+ for item in str1.split():
20
+ floats_list.append(float(item))
21
+ floats_list_list.append(floats_list)
22
+ np_lines = np.array(floats_list_list)
23
+ return np_lines
24
+
25
+ np_lines = parse_pssm('/home/swang523/RLcage/capsid/monomersfordesign/8-16-21/pssm_rainity_final_8-16-21_int/build_0.2089_0.98_0.4653_19_2.00_0.005745.pssm')
26
+
27
+ mpnn_alphabet = 'ACDEFGHIKLMNPQRSTVWYX'
28
+ input_alphabet = 'ARNDCQEGHILKMFPSTWYV'
29
+
30
+ permutation_matrix = np.zeros([20,21])
31
+ for i in range(20):
32
+ letter1 = input_alphabet[i]
33
+ for j in range(21):
34
+ letter2 = mpnn_alphabet[j]
35
+ if letter1 == letter2:
36
+ permutation_matrix[i,j]=1.
37
+
38
+ pssm_log_odds = np_lines[:,:20] @ permutation_matrix
39
+ pssm_probs = np_lines[:,20:40] @ permutation_matrix
40
+
41
+ X_mask = np.concatenate([np.zeros([1,20]), np.ones([1,1])], -1)
42
+
43
+ def softmax(x, T):
44
+ return np.exp(x/T)/np.sum(np.exp(x/T), -1, keepdims=True)
45
+
46
+ #Load parsed PDBs:
47
+ with open('/home/justas/projects/cages/parsed/test.jsonl', 'r') as json_file:
48
+ json_list = list(json_file)
49
+
50
+ my_dict = {}
51
+ for json_str in json_list:
52
+ result = json.loads(json_str)
53
+ all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
54
+ pssm_dict = {}
55
+ for chain in all_chain_list:
56
+ pssm_dict[chain] = {}
57
+ pssm_dict[chain]['pssm_coef'] = (np.ones(len(result['seq_chain_A']))).tolist() #a number between 0.0 and 1.0 specifying how much attention put to PSSM, can be adjusted later as a flag
58
+ pssm_dict[chain]['pssm_bias'] = (softmax(pssm_log_odds-X_mask*1e8, 1.0)).tolist() #PSSM like, [length, 21] such that sum over the last dimension adds up to 1.0
59
+ pssm_dict[chain]['pssm_log_odds'] = (pssm_log_odds).tolist()
60
+ my_dict[result['name']] = pssm_dict
61
+
62
+ #Write output to:
63
+ with open('/home/justas/projects/lab_github/mpnn/data/pssm_dict.jsonl', 'w') as f:
64
+ f.write(json.dumps(my_dict) + '\n')
ProteinMPNN/helper_scripts/parse_multiple_chains.out ADDED
@@ -0,0 +1 @@
 
 
1
+ Successfully finished: 2 pdbs
ProteinMPNN/helper_scripts/parse_multiple_chains.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ def main(args):
4
+
5
+ import numpy as np
6
+ import os, time, gzip, json
7
+ import glob
8
+
9
+ folder_with_pdbs_path = args.input_path
10
+ save_path = args.output_path
11
+ ca_only = args.ca_only
12
+
13
+ alpha_1 = list("ARNDCQEGHILKMFPSTWYV-")
14
+ states = len(alpha_1)
15
+ alpha_3 = ['ALA','ARG','ASN','ASP','CYS','GLN','GLU','GLY','HIS','ILE',
16
+ 'LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL','GAP']
17
+
18
+ aa_1_N = {a:n for n,a in enumerate(alpha_1)}
19
+ aa_3_N = {a:n for n,a in enumerate(alpha_3)}
20
+ aa_N_1 = {n:a for n,a in enumerate(alpha_1)}
21
+ aa_1_3 = {a:b for a,b in zip(alpha_1,alpha_3)}
22
+ aa_3_1 = {b:a for a,b in zip(alpha_1,alpha_3)}
23
+
24
+ def AA_to_N(x):
25
+ # ["ARND"] -> [[0,1,2,3]]
26
+ x = np.array(x);
27
+ if x.ndim == 0: x = x[None]
28
+ return [[aa_1_N.get(a, states-1) for a in y] for y in x]
29
+
30
+ def N_to_AA(x):
31
+ # [[0,1,2,3]] -> ["ARND"]
32
+ x = np.array(x);
33
+ if x.ndim == 1: x = x[None]
34
+ return ["".join([aa_N_1.get(a,"-") for a in y]) for y in x]
35
+
36
+
37
+ def parse_PDB_biounits(x, atoms=['N','CA','C'], chain=None):
38
+ '''
39
+ input: x = PDB filename
40
+ atoms = atoms to extract (optional)
41
+ output: (length, atoms, coords=(x,y,z)), sequence
42
+ '''
43
+ xyz,seq,min_resn,max_resn = {},{},1e6,-1e6
44
+ for line in open(x,"rb"):
45
+ line = line.decode("utf-8","ignore").rstrip()
46
+
47
+ if line[:6] == "HETATM" and line[17:17+3] == "MSE":
48
+ line = line.replace("HETATM","ATOM ")
49
+ line = line.replace("MSE","MET")
50
+
51
+ if line[:4] == "ATOM":
52
+ ch = line[21:22]
53
+ if ch == chain or chain is None:
54
+ atom = line[12:12+4].strip()
55
+ resi = line[17:17+3]
56
+ resn = line[22:22+5].strip()
57
+ x,y,z = [float(line[i:(i+8)]) for i in [30,38,46]]
58
+
59
+ if resn[-1].isalpha():
60
+ resa,resn = resn[-1],int(resn[:-1])-1
61
+ else:
62
+ resa,resn = "",int(resn)-1
63
+ # resn = int(resn)
64
+ if resn < min_resn:
65
+ min_resn = resn
66
+ if resn > max_resn:
67
+ max_resn = resn
68
+ if resn not in xyz:
69
+ xyz[resn] = {}
70
+ if resa not in xyz[resn]:
71
+ xyz[resn][resa] = {}
72
+ if resn not in seq:
73
+ seq[resn] = {}
74
+ if resa not in seq[resn]:
75
+ seq[resn][resa] = resi
76
+
77
+ if atom not in xyz[resn][resa]:
78
+ xyz[resn][resa][atom] = np.array([x,y,z])
79
+
80
+ # convert to numpy arrays, fill in missing values
81
+ seq_,xyz_ = [],[]
82
+ try:
83
+ for resn in range(min_resn,max_resn+1):
84
+ if resn in seq:
85
+ for k in sorted(seq[resn]): seq_.append(aa_3_N.get(seq[resn][k],20))
86
+ else: seq_.append(20)
87
+ if resn in xyz:
88
+ for k in sorted(xyz[resn]):
89
+ for atom in atoms:
90
+ if atom in xyz[resn][k]: xyz_.append(xyz[resn][k][atom])
91
+ else: xyz_.append(np.full(3,np.nan))
92
+ else:
93
+ for atom in atoms: xyz_.append(np.full(3,np.nan))
94
+ return np.array(xyz_).reshape(-1,len(atoms),3), N_to_AA(np.array(seq_))
95
+ except TypeError:
96
+ return 'no_chain', 'no_chain'
97
+
98
+
99
+
100
+ pdb_dict_list = []
101
+ c = 0
102
+
103
+ if folder_with_pdbs_path[-1]!='/':
104
+ folder_with_pdbs_path = folder_with_pdbs_path+'/'
105
+
106
+
107
+ init_alphabet = ['A', 'B', 'C', 'D', 'E', 'F', 'G','H', 'I', 'J','K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T','U', 'V','W','X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g','h', 'i', 'j','k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't','u', 'v','w','x', 'y', 'z']
108
+ extra_alphabet = [str(item) for item in list(np.arange(300))]
109
+ chain_alphabet = init_alphabet + extra_alphabet
110
+
111
+ biounit_names = glob.glob(folder_with_pdbs_path+'*.pdb')
112
+ for biounit in biounit_names:
113
+ my_dict = {}
114
+ s = 0
115
+ concat_seq = ''
116
+ concat_N = []
117
+ concat_CA = []
118
+ concat_C = []
119
+ concat_O = []
120
+ concat_mask = []
121
+ coords_dict = {}
122
+ for letter in chain_alphabet:
123
+ if ca_only:
124
+ sidechain_atoms = ['CA']
125
+ else:
126
+ sidechain_atoms = ['N', 'CA', 'C', 'O']
127
+ xyz, seq = parse_PDB_biounits(biounit, atoms=sidechain_atoms, chain=letter)
128
+ if type(xyz) != str:
129
+ concat_seq += seq[0]
130
+ my_dict['seq_chain_'+letter]=seq[0]
131
+ coords_dict_chain = {}
132
+ if ca_only:
133
+ coords_dict_chain['CA_chain_'+letter]=xyz.tolist()
134
+ else:
135
+ coords_dict_chain['N_chain_' + letter] = xyz[:, 0, :].tolist()
136
+ coords_dict_chain['CA_chain_' + letter] = xyz[:, 1, :].tolist()
137
+ coords_dict_chain['C_chain_' + letter] = xyz[:, 2, :].tolist()
138
+ coords_dict_chain['O_chain_' + letter] = xyz[:, 3, :].tolist()
139
+ my_dict['coords_chain_'+letter]=coords_dict_chain
140
+ s += 1
141
+ fi = biounit.rfind("/")
142
+ my_dict['name']=biounit[(fi+1):-4]
143
+ my_dict['num_of_chains'] = s
144
+ my_dict['seq'] = concat_seq
145
+ if s < len(chain_alphabet):
146
+ pdb_dict_list.append(my_dict)
147
+ c+=1
148
+
149
+
150
+ with open(save_path, 'w') as f:
151
+ for entry in pdb_dict_list:
152
+ f.write(json.dumps(entry) + '\n')
153
+
154
+
155
+ if __name__ == "__main__":
156
+ argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
157
+
158
+ argparser.add_argument("--input_path", type=str, help="Path to a folder with pdb files, e.g. /home/my_pdbs/")
159
+ argparser.add_argument("--output_path", type=str, help="Path where to save .jsonl dictionary of parsed pdbs")
160
+ argparser.add_argument("--ca_only", action="store_true", default=False, help="parse a backbone-only structure (default: false)")
161
+
162
+ args = argparser.parse_args()
163
+ main(args)
ProteinMPNN/helper_scripts/parse_multiple_chains.sh ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH --mem=32g
3
+ #SBATCH -c 2
4
+ #SBATCH --output=parse_multiple_chains.out
5
+
6
+ source activate mlfold
7
+ python parse_multiple_chains.py --input_path='../PDB_complexes/pdbs/' --output_path='../PDB_complexes/parsed_pdbs.jsonl'
ProteinMPNN/inputs/PDB_complexes/pdbs/3HTN.pdb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/inputs/PDB_complexes/pdbs/4YOW.pdb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/inputs/PDB_homooligomers/pdbs/4GYT.pdb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/inputs/PDB_homooligomers/pdbs/6EHB.pdb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/inputs/PDB_monomers/pdbs/5L33.pdb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/inputs/PDB_monomers/pdbs/6MRR.pdb ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/inputs/PSSM_inputs/3HTN.npz ADDED
Binary file (148 kB). View file
 
ProteinMPNN/inputs/PSSM_inputs/4YOW.npz ADDED
Binary file (240 kB). View file
 
ProteinMPNN/outputs/example_1_outputs/parsed_pdbs.jsonl ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ {"seq_chain_A": "GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE", "coords_chain_A": {"N_chain_A": [[-15.113, 4.641, 12.533], [-13.275, 3.42, 10.93], [-10.741, 1.675, 9.445], [-7.432, 1.448, 9.871], [-5.644, -0.548, 8.854], [-7.205, -1.96, 6.899], [-7.793, 0.183, 5.237], [-5.26, 0.685, 4.162], [-4.958, -1.588, 2.516], [-7.177, -1.19, 0.885], [-6.291, 1.085, -0.528], [-4.142, -0.079, -1.945], [-5.518, -1.938, -3.517], [-7.137, -0.17, -5.019], [-5.151, 1.232, -6.404], [-4.087, -0.835, -8.02], [-6.444, -1.411, -9.542], [-6.711, 1.026, -10.874], [-4.295, 1.105, -12.41], [-4.894, -1.039, -14.08], [-7.091, -0.121, -15.482], [-9.668, -0.781, -14.596], [-12.955, -0.348, -13.741], [-15.301, -1.272, -12.622], [-15.546, -2.334, -10.044], [-16.758, -2.673, -6.77], [-16.165, -1.622, -3.469], [-16.378, -2.183, -0.036], [-15.967, -1.433, 3.409], [-16.551, -2.631, 6.58], [-16.843, -2.78, 9.914], [-14.464, -4.361, 10.3], [-14.612, -6.107, 8.198], [-13.921, -6.735, 4.835], [-15.439, -6.161, 1.804], [-15.834, -7.017, -1.536], [-17.223, -6.523, -4.647], [-17.77, -7.513, -7.955], [-18.767, -7.543, -11.345], [-17.695, -9.931, -13.085], [-14.398, -10.398, -14.406], [-10.967, -10.184, -15.446], [-8.395, -12.722, -15.59], [-6.205, -11.462, -14.409], [-7.298, -9.643, -12.523], [-8.704, -11.528, -10.949], [-6.649, -13.146, -9.873], [-5.274, -11.233, -8.368], [-7.233, -10.585, -6.439], [-7.641, -13.078, -5.203], [-5.146, -13.427, -3.951], [-5.083, -11.169, -2.282], [-7.277, -11.766, -0.57], [-6.278, -14.12, 0.645], [-4.045, -13.155, 2.06], [-5.234, -11.366, 3.89], [-7.155, -12.951, 5.25], [-5.415, -14.589, 6.647], [-4.611, -12.942, 8.681], [-6.891, -12.779, 10.155], [-9.004, -11.198, 9.278], [-12.305, -10.775, 7.928], [-13.164, -11.51, 4.65], [-15.046, -10.898, 1.956], [-15.534, -11.745, -1.147], [-17.085, -11.4, -4.207], [-17.565, -12.423, -7.43], [-19.448, -11.812, -10.016]], "CA_chain_A": [[-15.455, 3.353, 11.854], [-12.239, 3.522, 9.924], [-9.735, 0.662, 9.74], [-6.128, 1.8, 9.322], [-5.074, -1.624, 8.054], [-7.991, -2.219, 5.697], [-7.623, 1.317, 4.337], [-4.025, 0.475, 3.411], [-5.233, -2.549, 1.457], [-8.065, -0.527, -0.059], [-5.465, 1.902, -1.408], [-3.396, -0.941, -2.853], [-6.467, -2.459, -4.49], [-7.527, 0.902, -5.927], [-4.022, 1.506, -7.283], [-4.098, -1.901, -9.02], [-7.565, -1.214, -10.455], [-6.381, 2.179, -11.705], [-3.302, 0.671, -13.388], [-5.533, -1.961, -15.007], [-8.251, 0.462, -16.125], [-10.865, -1.492, -14.176], [-13.808, 0.635, -13.093], [-16.571, -1.877, -12.233], [-15.4, -3.01, -8.758], [-17.257, -1.959, -5.603], [-15.423, -1.976, -2.265], [-16.955, -1.707, 1.214], [-15.316, -1.837, 4.653], [-17.596, -2.568, 7.598], [-16.558, -3.484, 11.165], [-13.399, -5.329, 10.144], [-14.87, -6.942, 7.044], [-13.72, -6.182, 3.503], [-16.342, -6.715, 0.811], [-15.531, -6.62, -2.905], [-18.17, -7.079, -5.604], [-17.534, -7.115, -9.33], [-19.369, -8.381, -12.371], [-16.73, -10.481, -13.989], [-13.092, -9.792, -14.467], [-9.913, -10.939, -16.082], [-7.519, -13.504, -14.741], [-5.339, -10.574, -13.637], [-8.066, -9.217, -11.357], [-9.037, -12.676, -10.118], [-5.482, -13.557, -9.098], [-5.007, -10.2, -7.373], [-8.209, -10.695, -5.362], [-7.356, -14.301, -4.461], [-3.977, -13.177, -3.117], [-5.501, -10.237, -1.239], [-8.087, -12.495, 0.396], [-5.447, -15.036, 1.423], [-3.208, -12.371, 2.957], [-6.035, -10.78, 4.959], [-7.726, -14.004, 6.078], [-4.371, -15.076, 7.539], [-4.587, -12.019, 9.803], [-8.15, -12.851, 10.866], [-10.051, -10.372, 8.703], [-13.312, -11.38, 7.069], [-13.029, -10.997, 3.289], [-16.162, -11.398, 1.161], [-15.314, -11.41, -2.547], [-18.077, -11.972, -5.106], [-17.355, -12.096, -8.833], [-20.724, -12.228, -10.578]], "C_chain_A": [[-14.525, 3.068, 10.696], [-11.128, 2.581, 10.337], [-8.423, 1.057, 9.074], [-5.594, 0.705, 8.401], [-5.884, -1.859, 6.782], [-7.943, -1.043, 4.732], [-6.325, 1.21, 3.548], [-4.256, -0.489, 2.257], [-6.223, -2.003, 0.447], [-7.273, 0.337, -1.032], [-4.696, 1.044, -2.401], [-4.311, -1.534, -3.911], [-6.855, -1.387, -5.493], [-6.396, 1.25, -6.885], [-3.907, 0.448, -8.373], [-5.226, -1.69, -10.026], [-7.297, -0.052, -11.4], [-5.369, 1.804, -12.788], [-3.949, -0.183, -14.469], [-6.661, -1.332, -15.813], [-9.553, -0.226, -15.794], [-11.707, -0.553, -13.334], [-15.118, 0.051, -12.584], [-16.509, -2.642, -10.91], [-15.935, -2.111, -7.648], [-16.587, -2.519, -4.359], [-16.155, -1.378, -1.072], [-16.124, -2.248, 2.369], [-16.312, -1.604, 5.773], [-17.173, -3.42, 8.787], [-15.579, -4.644, 10.974], [-13.57, -6.281, 8.988], [-14.622, -6.128, 5.784], [-14.64, -6.92, 2.547], [-15.981, -6.143, -0.547], [-16.543, -7.305, -3.811], [-17.806, -6.597, -6.999], [-18.266, -8.076, -10.237], [-18.258, -8.767, -13.326], [-15.43, -9.727, -13.946], [-12.137, -10.73, -15.157], [-9.201, -11.812, -15.06], [-6.628, -12.616, -13.888], [-6.046, -10.077, -12.38], [-8.321, -10.379, -10.406], [-7.846, -13.1, -9.277], [-5.133, -12.518, -8.043], [-5.947, -10.345, -6.178], [-8.016, -11.971, -4.552], [-6.204, -14.094, -3.491], [-4.309, -12.212, -1.986], [-6.31, -10.947, -0.159], [-7.225, -13.41, 1.257], [-4.622, -14.283, 2.46], [-4.01, -11.836, 4.136], [-6.693, -11.844, 5.834], [-6.69, -14.56, 7.045], [-4.246, -14.212, 8.781], [-5.907, -11.982, 10.559], [-9.234, -11.922, 10.372], [-11.067, -11.248, 7.98], [-13.236, -10.706, 5.708], [-14.076, -11.68, 2.422], [-15.976, -10.869, -0.254], [-16.369, -12.137, -3.367], [-17.771, -11.49, -6.511], [-18.538, -12.681, -9.594], [-20.658, -12.365, -12.09]], "O_chain_A": [[-14.897, 2.519, 9.662], [-10.68, 2.634, 11.485], [-8.304, 0.991, 7.855], [-5.143, 0.977, 7.279], [-5.323, -1.971, 5.685], [-8.0, -1.245, 3.513], [-6.273, 1.603, 2.377], [-3.814, -0.247, 1.129], [-6.118, -2.31, -0.74], [-7.536, 0.331, -2.241], [-4.583, 1.398, -3.577], [-3.94, -1.609, -5.083], [-6.892, -1.646, -6.703], [-6.638, 1.55, -8.059], [-3.651, 0.78, -9.537], [-5.01, -1.776, -11.239], [-7.634, -0.111, -12.591], [-5.549, 2.134, -13.966], [-3.596, -0.085, -15.651], [-7.156, -1.972, -16.745], [-10.461, -0.263, -16.626], [-11.246, -0.047, -12.305], [-15.961, 0.814, -12.103], [-17.363, -3.495, -10.656], [-15.586, -0.928, -7.571], [-16.44, -3.737, -4.225], [-16.472, -0.181, -1.077], [-15.638, -3.382, 2.317], [-16.864, -0.507, 5.887], [-17.148, -4.645, 8.685], [-15.802, -5.764, 11.436], [-12.747, -7.189, 8.815], [-15.045, -4.971, 5.688], [-14.627, -8.155, 2.491], [-15.862, -4.925, -0.694], [-16.72, -8.526, -3.736], [-17.604, -5.4, -7.224], [-18.383, -9.265, -9.953], [-17.93, -8.033, -14.267], [-15.345, -8.577, -13.499], [-12.441, -11.902, -15.409], [-9.352, -11.668, -13.836], [-6.32, -12.97, -12.748], [-5.482, -10.097, -11.272], [-8.202, -10.239, -9.186], [-7.993, -13.366, -8.078], [-4.776, -12.872, -6.917], [-5.52, -10.263, -5.018], [-8.189, -11.956, -3.329], [-6.257, -14.545, -2.341], [-3.883, -12.411, -0.84], [-6.071, -10.762, 1.042], [-7.393, -13.461, 2.478], [-4.512, -14.709, 3.621], [-3.538, -11.859, 5.278], [-6.787, -11.672, 7.055], [-7.035, -14.941, 8.167], [-3.788, -14.689, 9.822], [-6.022, -11.231, 11.533], [-10.311, -11.879, 10.981], [-10.746, -12.329, 7.484], [-13.222, -9.474, 5.623], [-14.005, -12.897, 2.196], [-16.231, -9.692, -0.528], [-16.5, -13.357, -3.266], [-17.731, -10.283, -6.76], [-18.639, -13.904, -9.761], [-21.628, -12.801, -12.712]]}, "name": "6MRR", "num_of_chains": 1, "seq": "GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE"}
2
+ {"seq_chain_A": "HMPEEEKAARLFIEALEKGDPELMRKVISPDTRMEDNGREFTGDEVVEYVKEIQKRGEQWHLRRYTKEGNSWRFEVQVDNNGQTEQWEVQIEVRNGRIKRVTITHV", "coords_chain_A": {"N_chain_A": [[37.0, 18.222, 51.819], [35.18, 19.045, 54.805], [33.142, 21.39, 56.357], [32.697, 22.256, 59.882], [30.075, 22.366, 60.868], [28.465, 21.048, 58.967], [29.669, 18.568, 59.079], [29.059, 17.634, 61.702], [26.271, 17.24, 61.58], [26.225, 15.306, 59.622], [27.541, 13.181, 60.918], [25.603, 12.501, 62.842], [23.621, 11.465, 61.194], [25.073, 9.367, 60.115], [25.367, 7.722, 62.376], [22.785, 6.789, 62.655], [22.499, 5.42, 60.214], [24.449, 3.414, 60.569], [23.344, 2.25, 62.7], [24.374, 2.554, 65.225], [24.763, 2.964, 68.494], [26.944, 3.77, 70.035], [28.442, 5.552, 68.362], [26.446, 7.553, 68.106], [26.246, 8.499, 70.748], [28.563, 9.948, 71.018], [28.108, 12.096, 69.352], [25.861, 13.648, 70.164], [24.504, 16.068, 72.578], [23.777, 16.707, 76.082], [21.518, 18.351, 75.963], [20.455, 18.057, 73.397], [17.593, 17.855, 71.366], [15.114, 15.757, 69.978], [12.531, 15.031, 67.686], [10.309, 12.633, 66.571], [7.796, 11.95, 64.523], [5.869, 12.982, 66.497], [6.728, 11.686, 68.914], [8.927, 12.6, 71.49], [12.386, 13.091, 72.157], [15.152, 13.567, 74.34], [18.607, 14.262, 74.992], [20.249, 11.415, 76.286], [18.136, 9.735, 75.635], [18.012, 9.917, 72.876], [20.144, 8.337, 72.204], [19.126, 5.939, 73.133], [17.028, 5.681, 71.32], [18.585, 4.982, 69.05], [19.592, 2.533, 69.856], [17.261, 1.04, 69.728], [16.741, 1.045, 67.012], [18.819, -0.538, 66.081], [18.18, -2.888, 67.459], [15.617, -3.575, 66.242], [16.392, -4.089, 63.726], [16.097, -2.152, 61.8], [16.701, -0.407, 58.831], [18.613, 2.321, 57.804], [19.364, 4.606, 55.165], [21.224, 7.164, 53.799], [20.201, 9.35, 51.263], [20.133, 11.886, 49.855], [20.945, 15.377, 50.442], [20.462, 18.834, 51.494], [21.245, 22.207, 52.499], [18.911, 23.991, 54.2], [17.94, 27.07, 54.88], [18.965, 27.602, 58.12], [19.227, 24.935, 58.758], [19.98, 21.458, 58.206], [18.725, 19.353, 55.737], [17.945, 16.136, 55.375], [17.279, 13.513, 53.17], [16.307, 10.372, 53.735], [16.304, 6.85, 53.277], [14.36, 4.368, 54.732], [14.218, 1.096, 55.78], [12.396, -1.822, 56.732], [11.509, -5.193, 57.372], [10.625, -5.836, 54.804], [9.326, -3.567, 53.956], [9.713, -0.36, 52.528], [10.495, 2.99, 53.244], [10.35, 6.575, 52.736], [11.862, 9.008, 54.514], [12.232, 12.164, 56.203], [14.836, 14.014, 57.563], [15.572, 16.566, 59.658], [18.051, 18.728, 60.656], [19.208, 20.958, 63.088], [22.156, 22.563, 63.663], [23.645, 24.166, 66.175], [26.134, 26.306, 67.255], [28.264, 24.429, 66.79], [27.157, 22.089, 67.871], [24.652, 19.433, 68.054], [21.475, 20.587, 67.962], [18.615, 20.547, 67.532], [16.573, 18.007, 66.237], [14.476, 16.857, 63.707], [12.331, 14.421, 62.706], [10.453, 13.297, 60.044], [7.846, 11.327, 58.803], [5.177, 10.579, 57.057]], "CA_chain_A": [[36.936, 18.773, 53.168], [33.829, 19.307, 55.268], [33.003, 22.335, 57.475], [32.383, 21.616, 61.147], [28.63, 22.278, 61.041], [27.969, 19.998, 58.095], [30.255, 17.336, 59.605], [28.319, 17.193, 62.883], [24.978, 16.74, 61.124], [26.544, 14.088, 58.891], [27.832, 12.133, 61.893], [24.312, 12.112, 63.413], [23.007, 10.631, 60.175], [25.91, 8.164, 60.045], [25.045, 6.895, 63.536], [21.519, 6.158, 62.308], [22.821, 4.501, 59.135], [25.19, 2.299, 61.114], [22.592, 1.729, 63.824], [25.424, 2.426, 66.209], [24.548, 3.825, 69.667], [28.216, 4.325, 70.466], [28.703, 6.763, 67.587], [25.452, 8.572, 68.459], [26.576, 9.042, 72.062], [29.65, 10.906, 70.821], [27.638, 13.226, 68.553], [24.834, 14.469, 70.801], [24.761, 16.77, 73.826], [22.683, 16.47, 77.021], [20.351, 19.16, 75.595], [19.926, 17.722, 72.07], [16.203, 17.421, 71.347], [14.694, 15.122, 68.734], [11.136, 14.692, 67.52], [10.007, 11.689, 65.501], [6.346, 11.777, 64.428], [5.194, 13.191, 67.767], [7.53, 11.223, 70.04], [10.037, 13.505, 71.773], [13.571, 12.523, 72.816], [16.186, 14.512, 74.768], [19.828, 13.667, 75.507], [20.521, 9.981, 76.157], [17.012, 9.105, 74.937], [18.376, 9.832, 71.466], [20.98, 7.138, 72.128], [18.19, 4.839, 73.298], [16.414, 5.594, 69.984], [19.465, 4.246, 68.135], [19.602, 1.154, 70.325], [16.045, 0.407, 69.247], [16.922, 0.85, 65.574], [19.706, -1.684, 66.003], [17.363, -4.009, 67.906], [14.558, -3.966, 65.32], [17.02, -4.306, 62.438], [15.647, -1.091, 60.918], [17.816, 0.015, 57.986], [18.627, 3.687, 57.284], [20.338, 4.921, 54.124], [21.169, 8.544, 53.342], [20.076, 9.473, 49.814], [20.579, 13.205, 49.457], [20.616, 16.466, 51.352], [20.718, 20.236, 51.207], [21.091, 23.102, 53.628], [17.708, 24.807, 54.042], [18.051, 28.065, 55.926], [20.011, 27.214, 59.057], [19.161, 23.525, 59.1], [20.28, 20.496, 57.163], [17.581, 18.51, 55.443], [18.39, 14.859, 54.825], [16.355, 12.518, 52.643], [16.831, 9.069, 54.073], [15.447, 5.708, 53.01], [14.308, 3.533, 55.936], [13.632, -0.203, 55.453], [12.292, -2.933, 57.665], [10.532, -6.258, 57.193], [10.11, -5.81, 53.444], [8.72, -2.286, 53.633], [10.651, 0.732, 52.332], [9.888, 4.302, 53.432], [11.125, 7.801, 52.577], [11.761, 9.852, 55.689], [12.975, 13.407, 56.146], [15.508, 14.281, 58.831], [15.997, 17.954, 59.618], [18.953, 18.92, 61.785], [19.776, 22.273, 63.355], [23.418, 22.556, 64.378], [23.916, 25.488, 66.721], [27.402, 26.364, 67.981], [28.974, 23.169, 66.639], [26.507, 20.955, 68.531], [23.321, 19.08, 67.562], [20.524, 21.296, 68.816], [17.262, 20.046, 67.347], [16.415, 17.077, 65.126], [13.085, 16.599, 63.409], [12.153, 13.313, 61.776], [9.088, 13.272, 59.537], [7.438, 10.301, 57.858], [3.735, 10.392, 57.071]], "C_chain_A": [[35.516, 19.181, 53.529], [33.835, 20.246, 56.472], [32.424, 21.668, 58.723], [30.875, 21.424, 61.352], [28.045, 21.144, 60.221], [28.428, 18.625, 58.602], [29.459, 16.766, 60.777], [26.977, 16.58, 62.491], [25.152, 15.408, 60.402], [26.83, 12.921, 59.83], [26.562, 11.609, 62.585], [23.641, 11.12, 62.474], [23.742, 9.311, 60.031], [25.597, 7.187, 61.182], [23.732, 6.145, 63.321], [21.733, 5.074, 61.25], [23.514, 3.243, 59.644], [24.422, 1.619, 62.24], [23.493, 1.584, 65.033], [25.271, 3.432, 67.346], [25.829, 4.473, 70.157], [28.568, 5.591, 69.685], [27.729, 7.873, 67.973], [25.765, 9.271, 69.778], [27.63, 10.133, 71.946], [29.225, 12.164, 70.07], [26.725, 14.189, 69.313], [25.32, 15.147, 72.075], [23.586, 16.509, 74.765], [21.4, 17.227, 76.67], [19.883, 19.001, 74.14], [18.479, 17.252, 72.148], [15.814, 16.879, 69.984], [13.215, 14.791, 68.796], [11.019, 13.732, 66.357], [8.511, 11.383, 65.495], [5.617, 11.906, 65.766], [5.97, 12.779, 69.006], [8.693, 12.188, 70.248], [11.14, 12.773, 72.507], [14.571, 13.629, 73.143], [17.412, 13.767, 75.307], [19.981, 12.179, 75.234], [19.346, 9.184, 75.593], [17.272, 8.974, 73.437], [19.264, 8.61, 71.249], [20.105, 5.892, 72.234], [17.42, 4.6, 71.996], [17.271, 4.767, 69.027], [19.569, 2.782, 68.557], [18.408, 0.378, 69.783], [16.076, 0.161, 67.749], [17.732, -0.411, 65.332], [18.974, -2.968, 66.397], [16.379, -4.492, 66.836], [15.096, -4.299, 63.928], [16.594, -3.31, 61.369], [16.837, -0.53, 60.144], [17.671, 1.442, 57.459], [19.7, 3.886, 56.222], [20.172, 6.366, 53.668], [21.132, 8.588, 51.81], [20.646, 10.787, 49.314], [20.267, 14.245, 50.522], [21.046, 17.824, 50.859], [20.686, 21.006, 52.51], [19.918, 24.019, 53.329], [17.639, 25.811, 55.176], [19.214, 27.689, 56.819], [19.807, 25.792, 59.584], [19.526, 22.664, 57.913], [19.11, 19.552, 56.986], [18.028, 17.274, 54.701], [17.2, 14.029, 54.397], [16.961, 11.15, 52.89], [15.812, 8.012, 53.681], [15.513, 4.779, 54.22], [13.552, 2.243, 55.625], [13.475, -1.054, 56.689], [11.196, -3.906, 57.263], [9.861, -6.201, 55.824], [9.414, -4.523, 53.038], [9.757, -1.167, 53.588], [9.951, 2.083, 52.441], [10.884, 5.43, 53.152], [10.835, 8.756, 53.723], [12.571, 11.114, 55.473], [13.554, 13.659, 57.513], [16.084, 15.679, 58.809], [16.836, 18.239, 60.853], [19.393, 20.364, 61.913], [21.031, 22.116, 64.2], [23.671, 23.966, 64.867], [25.19, 25.419, 67.548], [28.163, 25.044, 67.963], [28.334, 21.936, 67.267], [25.122, 20.677, 67.946], [22.274, 19.664, 68.485], [19.119, 20.714, 68.743], [17.234, 19.151, 66.116], [14.943, 16.776, 64.944], [12.96, 15.527, 62.351], [10.703, 13.266, 61.349], [8.891, 12.117, 58.572], [5.949, 10.026, 57.982], [3.267, 9.765, 55.77]], "O_chain_A": [[34.75, 19.627, 52.679], [34.466, 19.951, 57.486], [31.745, 20.64, 58.632], [30.444, 20.43, 61.936], [27.223, 20.37, 60.71], [27.666, 17.644, 58.56], [29.228, 15.556, 60.851], [26.587, 15.54, 63.01], [24.344, 14.49, 60.563], [26.43, 11.793, 59.561], [26.466, 10.42, 62.876], [23.133, 10.086, 62.899], [23.11, 8.253, 59.861], [25.565, 5.978, 60.976], [23.587, 5.004, 63.732], [21.208, 3.961, 61.365], [23.214, 2.143, 59.19], [24.797, 0.534, 62.681], [23.414, 0.602, 65.768], [25.628, 4.597, 67.202], [25.794, 5.601, 70.653], [28.946, 6.599, 70.283], [28.147, 9.011, 68.154], [25.567, 10.478, 69.921], [27.585, 11.13, 72.657], [29.904, 13.182, 70.145], [26.795, 15.401, 69.13], [26.4, 14.846, 72.582], [22.509, 16.12, 74.303], [20.32, 16.76, 77.023], [19.016, 19.744, 73.693], [18.171, 16.339, 72.914], [16.109, 17.485, 68.964], [12.702, 14.379, 69.847], [11.564, 13.984, 65.281], [8.025, 10.649, 66.357], [4.861, 11.02, 66.147], [5.876, 13.441, 70.042], [9.374, 12.559, 69.286], [10.86, 11.923, 73.362], [14.807, 14.512, 72.329], [17.271, 12.751, 75.992], [19.846, 11.725, 74.084], [19.535, 8.068, 75.133], [16.802, 8.045, 72.793], [19.141, 7.904, 70.238], [20.301, 4.917, 71.505], [17.186, 3.453, 71.614], [16.744, 3.928, 68.26], [19.568, 1.88, 67.714], [18.527, -0.796, 69.411], [15.514, -0.823, 67.273], [17.375, -1.244, 64.501], [19.117, -3.994, 65.744], [16.312, -5.687, 66.557], [14.345, -4.733, 63.062], [16.717, -3.592, 60.189], [17.874, -0.25, 60.724], [16.729, 1.724, 56.722], [20.833, 3.459, 56.399], [19.101, 6.761, 53.213], [21.922, 7.924, 51.14], [21.54, 10.807, 48.463], [19.441, 14.03, 51.403], [21.88, 17.95, 49.948], [20.154, 20.521, 53.506], [19.91, 24.707, 52.301], [17.308, 25.452, 56.305], [20.323, 27.467, 56.337], [20.162, 25.477, 60.718], [19.425, 23.084, 56.754], [18.544, 19.031, 57.959], [18.427, 17.355, 53.535], [16.231, 13.89, 55.157], [18.022, 10.811, 52.366], [14.608, 8.257, 53.699], [16.601, 4.461, 54.699], [12.391, 2.3, 55.229], [14.338, -1.043, 57.557], [10.101, -3.503, 56.86], [8.668, -6.468, 55.697], [8.959, -4.394, 51.905], [10.569, -1.042, 54.499], [8.934, 2.298, 51.817], [12.103, 5.252, 53.305], [9.722, 9.241, 53.893], [13.499, 11.135, 54.676], [12.845, 13.511, 58.513], [16.983, 15.964, 58.026], [16.401, 17.994, 61.976], [19.882, 20.939, 60.945], [20.986, 21.581, 65.294], [23.86, 24.873, 64.065], [25.32, 24.562, 68.418], [28.641, 24.59, 68.998], [28.902, 20.843, 67.213], [24.492, 21.583, 67.413], [22.176, 19.279, 69.665], [18.494, 20.423, 69.772], [17.819, 19.495, 65.079], [14.236, 16.503, 65.911], [13.432, 15.7, 61.239], [9.814, 13.225, 62.192], [9.69, 11.926, 57.664], [5.503, 9.336, 58.905], [3.96, 9.844, 54.754]]}, "name": "5L33", "num_of_chains": 1, "seq": "HMPEEEKAARLFIEALEKGDPELMRKVISPDTRMEDNGREFTGDEVVEYVKEIQKRGEQWHLRRYTKEGNSWRFEVQVDNNGQTEQWEVQIEVRNGRIKRVTITHV"}
ProteinMPNN/outputs/example_1_outputs/seqs/5L33.fa ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ >5L33, score=1.5874, global_score=1.5874, fixed_chains=[], designed_chains=['A'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
2
+ HMPEEEKAARLFIEALEKGDPELMRKVISPDTRMEDNGREFTGDEVVEYVKEIQKRGEQWHLRRYTKEGNSWRFEVQVDNNGQTEQWEVQIEVRNGRIKRVTITHV
3
+ >T=0.1, sample=1, score=0.8221, global_score=0.8221, seq_recovery=0.5094
4
+ MINEEEKKALDFIEALEKADPELMKKVIEPDTKMEVNGKKYEGEEIVEFVKKLKEEGVKYKLLSYKKEGNKYVFEVEKSKNGVTKKITIEIEVENGKVKKIVITEK
5
+ >T=0.1, sample=2, score=0.8356, global_score=0.8356, seq_recovery=0.4434
6
+ SINEEEQKALDYIKALEKADPELMKKVITPDTKMTVNGKEYEGEEIVEYVKELKERGIKYKLLSYKKEGDKYVFTVERSENGKTYTITIEVKVKDGKVEEIVIKEE
ProteinMPNN/outputs/example_1_outputs/seqs/6MRR.fa ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ >6MRR, score=1.4683, global_score=1.4683, fixed_chains=[], designed_chains=['A'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
2
+ GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
3
+ >T=0.1, sample=1, score=0.9617, global_score=0.9617, seq_recovery=0.5000
4
+ GMDEELEKYVKELKAFLKEKGINNVEIKIENGTLTIKMNGASKETREFLEKLKKELEEKGYKVNIEIS
5
+ >T=0.1, sample=2, score=0.9513, global_score=0.9513, seq_recovery=0.4853
6
+ GKDEELEKYVKELKKFLKEKGINNVKIEVKDGTLTIEMKGCSKETKDFLKKLKKELEKKGYKVNIKIY
ProteinMPNN/outputs/example_2_outputs/assigned_pdbs.jsonl ADDED
@@ -0,0 +1 @@
 
 
1
+ {"3HTN": [["A", "B"], ["C"]], "4YOW": [["A", "B"], ["C", "D", "E", "F"]]}
ProteinMPNN/outputs/example_2_outputs/parsed_pdbs.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
ProteinMPNN/outputs/example_2_outputs/seqs/3HTN.fa ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ >3HTN, score=1.4405, global_score=1.4946, fixed_chains=['C'], designed_chains=['A', 'B'], CA_model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
2
+ NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER/NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKXXXXDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER
3
+ >T=0.1, sample=1, score=0.8450, global_score=1.0949, seq_recovery=0.5071
4
+ KLYSYKEIGNKYIVSINVGTDLVEALKKFCEEKNIKSGTINGIGEVSKLTLKFYDFETKETELKTFEGNFTISNLTGLIYTYNGKIFLHLHVTFGDEDFSALAGHLVSATVLQEALLKVENYNENITAKFDEKLGLYLLDFNS/MSYKYKKIGNKYLVSINIGKDLVESLKEFVKEKNIKSGTINGIGGVSEVTLRFFDPEXXXXKERTFKGLFDISNLTGFISTKDGEPFLHLHATFGDEDFSALAGHLVSAKVSTGAELLVENYNVELTRKYDEKLGVYLLDFNA
5
+ >T=0.1, sample=2, score=0.8471, global_score=1.0996, seq_recovery=0.5000
6
+ MLYDYKKIGNKYFVKVNVDQDLVEALKEFCEELGIKSGTINGIGEVSEVTLRFFDFETKESVDKTFKEPFTISNLTGLISTYNGKIHLHLHITFSDKEFSALAGHLVSAKVLQEALLIVEDYGENITRKYDKETGLLLLDFNS/MLYKYKKIGNKYLIEINIGKDLVEALKEFVEEKNIKAGTINGIGMVEEVTLEYYDPKXXXXEKKTFEGLFEISNLTGFIYTKDGKPVLHLHVTFGDEDFSALAGHLVSAKVLGEAELLVEDYNVELTVKYDEERGEDLLDFNS
ProteinMPNN/outputs/example_2_outputs/seqs/4YOW.fa ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ >4YOW, score=1.3574, global_score=1.3913, fixed_chains=['C', 'D', 'E', 'F'], designed_chains=['A', 'B'], CA_model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
2
+ MRIVAADTGGAVLDESFQPVGLIATVAVLVEKPYKTSKRFLVKYADPYNYDLSGRQAIRDEIELAIELAREVSPDVIHLNSTLGGIEVRKLDESTIDALQISDRGKEIWKELSKDLQPLAKKFWEETGIEIIAIGKSSVPVRIAEIYAGIFSVKWALDNVKEKGGLLVGLPRYMEVEIKKDKIIGKSLDPREGGLYGEVKTEVPQGIKWELYPNPLVRRFMVFEITS/XXXX
3
+ >T=0.1, sample=1, score=0.8241, global_score=1.2059, seq_recovery=0.5154
4
+ MKIVASDAGGYLLDEELKPIGRIAVVAVLVEKPFTSAKEYKVEYLDPEKYNLEGNDDLIKEFELAVELAKKYKPDVILLDLNLGGVELSELNPEVIEKLQISEETKEFLIKLSEILSPKAKEFKKETGIPILLAGGNSTAVKIAELLASAAAVKWALENVKEKGKLLIGLERAVEIEIEEDKIRARDLDPRYGGLYAEIDIKIPEGLKYEQYPNPFKPGEMVFEIEK/XXXX
5
+ >T=0.1, sample=2, score=0.8195, global_score=1.2174, seq_recovery=0.5419
6
+ MKIVAADAGGYLVDEDLKPIGRIAVVAVLVEKPFTSSKVYKVKYIDPEKADLNGNEDLRLELELAIELAKEYKPDIILLDLNLGGVELSELNEETIKKLQISEEAKKKLIELSKELSPLAKKFKEETGIPILLAGDNSVPVHIAEILASAEAVKWALENVKEKGEVKVLLHESVSIEIEEDKIKARSLDPRLGGLEAEIEIKIPEGIEYEQEPNPFRPHHMVFTAKV/XXXX
ProteinMPNN/outputs/example_3_outputs/seqs/3HTN.fa ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ >3HTN, score=1.1550, global_score=1.1955, fixed_chains=['C'], designed_chains=['A', 'B'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
2
+ NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER/NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKXXXXDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER
3
+ >T=0.1, sample=1, score=0.7339, global_score=0.9189, seq_recovery=0.5390
4
+ KLYDYEKIGNKYIVSIYNNTDIVKALKKFCEEKNIKSGTVNGIGQVKEVTLKFYNFETKESEEKTFKKNFTISNLTGFISEHDGKIFLDLHITFGDENFSALAGHLVSAIVNGECKLVIEDYKEKVSTKYDEELGLWLLDFNK/ETYKYKKIGNKYLVSINNGKDLVDSIKKFCKDKKIKSGTVNGIGSISKLTLEFFDPDXXXXKTKTLEKNLEISNLTGFISTKDGEVFLDLHITIGDENFSALAGHLISAIVNGIAELKIEDYNKEINVKYDEKLGLYLLDFNK
5
+ >T=0.1, sample=2, score=0.7064, global_score=0.9034, seq_recovery=0.5993
6
+ HMYEYKKIGNKYIVSVKNNTELVEALKAFCEEKKIKSGTVNGIGQVKSVTLRFYDFKTKTSKDTTFNQNLEISNLTGFISEYNNKVFLDLHITFGDSNFSALAGHLLSAVVGGEAIFVVEDYKEKISRKYDEKLGLYLLDFNK/NMYKYKKIGNKYIVSINNGKNLVKALKKFCEDKNIKSGTINGIGMISKVTLYFFDPEXXXXTTKTFNELLEISNLTGFISEKNGKVFLHLHITIGDSNFSALAGHLIDAVVNGIAEVIVEDFNEKINVKYNEETGLWLLDFNK