File size: 5,297 Bytes
c07afb1 d4cd645 c630634 6a5a99e d4cd645 01e1181 d4cd645 6a5a99e f3d4b52 6a5a99e f3d4b52 a253f7d d4cd645 f3d4b52 c5b5a7f f3d4b52 42d3d55 6a5a99e c2b236f 6a5a99e 42d3d55 a253f7d 6a5a99e a253f7d d4cd645 f3d4b52 6a5a99e d4cd645 6a5a99e f3d4b52 42d3d55 d4cd645 42d3d55 5452b6b d4cd645 a866f26 42d3d55 d4cd645 74a86c6 5452b6b 42d3d55 5b0b809 42d3d55 9bae2da f3d4b52 74a86c6 5c5788c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
title: PROTAC-Degradation-Predictor
emoji: π§¬
colorFrom: pink
colorTo: green
sdk: gradio
sdk_version: 4.37.2
app_file: app.py
pinned: false
license: mit
---
![Maturity level-0](https://img.shields.io/badge/Maturity%20Level-ML--0-red)
<a href="https://colab.research.google.com/github/ribesstefano/PROTAC-Degradation-Predictor/blob/main/notebooks/protac_degradation_predictor_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/ailab-bio/PROTAC-Degradation-Predictor)
# PROTAC-Degradation-Predictor
A machine learning-based tool for predicting PROTAC protein degradation activity.
## π Table of Contents
- [Data Curation](#-data-curation)
- [Installation](#-installation)
- [Documentation and Usage](#-documentation-and-usage)
- [Training](#-training)
- [Citation](#-citation)
- [License](#-license)
## π Data Curation
The code for data curation can be found in the Jupyter notebook [`data_curation.ipynb`](notebooks/data_curation.ipynb).
The folder [data/studies](data/studies/) contains the training and test data used in each study reported in our paper. The label column that is used for predictions is named _"Active (Dmax 0.6, pDC50 6.0)"_ and contains binary values.
## π Installation
To install the package, open your terminal and run the following commands:
```bash
pip install git+https://github.com/ribesstefano/PROTAC-Degradation-Predictor.git
```
The package has been developed on a Linux machine with Python 3.10.8. It is recommended to use a virtual environment to avoid conflicts with other packages.
## π― Documentation and Usage
The package documentation can be found [here](https://ribesstefano.github.io/PROTAC-Degradation-Predictor/).
For a walkthrough on how to use the package, please refer to the tutorial notebook [`protac_degradation_predictor_tutorial.ipynb`](notebooks/protac_degradation_predictor_tutorial.ipynb).
After installing the package, you can use it as follows:
```python
import protac_degradation_predictor as pdp
protac_smiles = 'Cc1ncsc1-c1ccc(CNC(=O)[C@@H]2C[C@@H](O)CN2C(=O)[C@@H](NC(=O)COCCCCCCCCCOCC(=O)Nc2ccc(C(=O)Nc3ccc(F)cc3N)cc2)C(C)(C)C)cc1'
e3_ligase = 'VHL'
target_uniprot = 'P04637'
cell_line = 'HeLa'
active_protac = pdp.is_protac_active(
protac_smiles,
e3_ligase,
target_uniprot,
cell_line,
)
print(f'The given PROTAC is: {"active" if active_protac else "inactive"}')
```
This example demonstrates how to predict the activity of a PROTAC molecule. The `is_protac_active` function takes the SMILES string of the PROTAC, the E3 ligase, the UniProt ID of the target protein, and the cell line as inputs. It returns whether the PROTAC is active or not.
The function supports batch computation by passing lists of SMILES strings, E3 ligases, UniProt IDs, and cell lines. In this case, it returns a list of booleans indicating the activity of each PROTAC.
## π Training
Before running the experiments reported in our work or train on your custom dataset, here are some required steps to follow (assuming one is in the repository directory already):
1. Download the data from the [Cellosaurus database](https://web.expasy.org/cellosaurus/) and save it in the `data` directory:
```bash
wget https://ftp.expasy.org/databases/cellosaurus/cellosaurus.txt data/
```
2. Make a copy of the Uniprot embeddings to be placed in the `data` directory:
```bash
cp protac_degradation_predictor/data/uniprot2embedding.h5 data/
```
3. Create a virtual environment and install the required packages by running the following commands:
```bash
conda env create -f environment.yaml
conda activate protac-degradation-predictor
```
4. The code for training the PyTorch models can be found in the file [`run_experiments_pytorch.py`](src/run_experiments_pytorch.py).
(Don't forget to adjust the `PYTHONPATH` environment variable to include the repository directory: `export PYTHONPATH=$PYTHONPATH:/path/to/PROTAC-Degradation-Predictor`)
### Training on Custom Dataset
For training a model on a user-provided dataset, please refer to the guide reported in [this README](src/README.md).
## π Citation
If you use this tool in your research, please cite the following paper:
```
@article{Ribes_2024,
title={Modeling PROTAC degradation activity with machine learning},
volume={6},
ISSN={2667-3185},
url={http://dx.doi.org/10.1016/j.ailsci.2024.100104},
DOI={10.1016/j.ailsci.2024.100104},
journal={Artificial Intelligence in the Life Sciences},
publisher={Elsevier BV},
author={Ribes, Stefano and Nittinger, Eva and Tyrchan, Christian and Mercado, RocΓo},
year={2024},
month=dec, pages={100104}
}
```
The directories [logs](logs/) and [reports](reports/) contain the logs and reports generated during the experiments reported in the paper. Additionally, in [reports](reports/), one can find the pickled Optuna studies for the reported experiments.
The directory [models](models/) contains the trained models for the experiments reported in the paper.
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|