Spaces:

ailab-bio
/

PROTAC-Degradation-Predictor

Sleeping

App Files Files Community

ribesstefano commited on Jun 7

Commit

f3d4b52

•

1 Parent(s): a8d1800

Updated README

Browse files

Files changed (18) hide show

README.md +30 -13
notebooks/best_fingerprint_search.ipynb +0 -0
notebooks/cell_type_embedding.ipynb +49 -1
notebooks/plot_experimental_results.ipynb +0 -0
notebooks/predict_unknown_protacs.ipynb +0 -0
protac_degradation_predictor/models/{best_model_n0_random-epoch=6-val_acc=0.74-val_roc_auc=0.796.ckpt → best_model_n0_random-epoch=13-val_acc=0.83-val_roc_auc=0.841-v1.ckpt} +2 -2
protac_degradation_predictor/models/best_model_n1_random-epoch=8-test_acc=0.78-test_roc_auc=0.851.ckpt +3 -0
protac_degradation_predictor/models/best_model_n2_random-epoch=9-val_acc=0.80-val_roc_auc=0.841-v1.ckpt +3 -0
protac_degradation_predictor/models/cv_model_random_fold0-epoch=12-val_acc=0.86-val_roc_auc=0.905.ckpt +3 -0
protac_degradation_predictor/models/cv_model_random_fold1-epoch=16-val_acc=0.86-val_roc_auc=0.933.ckpt +3 -0
protac_degradation_predictor/models/cv_model_random_fold2-epoch=16-val_acc=0.86-val_roc_auc=0.908.ckpt +3 -0
protac_degradation_predictor/models/cv_model_random_fold3-epoch=15-val_acc=0.89-val_roc_auc=0.930.ckpt +3 -0
protac_degradation_predictor/models/cv_model_random_fold4-epoch=15-val_acc=0.88-val_roc_auc=0.928.ckpt +3 -0
protac_degradation_predictor/optuna_utils.py +1 -1
protac_degradation_predictor/protac_degradation_predictor.py +55 -27
protac_degradation_predictor/pytorch_models.py +25 -24
src/plot_experiment_results.py +0 -1
src/run_experiments.py +0 -1

README.md CHANGED Viewed

@@ -1,24 +1,38 @@
-![Maturity level-0](https://img.shields.io/badge/Maturity%20Level-ML--0-red)
-# PROTAC-Degradation-Predictor
-Predicting PROTAC protein degradation activity via machine learning.
-## Data Curation
-For data curation code, please refer to the code in the Jupyter notebooks [`data_curation.ipynb`](notebooks/data_curation.ipynb).
-## Installing the Package
-To install the package, run the following command:
 ```bash
 pip install .
 ```
-## Running the Package
-To run the package after installation, here is an example snippet:
 ```python
 import protac_degradation_predictor as pdp
@@ -33,16 +47,19 @@ active_protac = pdp.is_protac_active(
     e3_ligase,
     target_uniprot,
     cell_line,
-    device='gpu', # Default to 'cpu'
     proba_threshold=0.5, # Default value
 )
 print(f'The given PROTAC is: {"active" if active_protac else "inactive"}')
 ```
-> If you're coming from my [thesis repo](https://github.com/ribesstefano/Machine-Learning-for-Predicting-Targeted-Protein-Degradation), I just wanted to create a separate and "less generic" repo for fast prototyping new ideas.
-> Stefano.
-> Why haven't you trained on more (i.e., the whole) data? We did, and we might just need _way_ more data to get better results...

+<!-- ![Maturity level-0](https://img.shields.io/badge/Maturity%20Level-ML--0-red)
+# PROTAC-Degradation-Predictor -->
+<p align="center">
+  <img src="https://img.shields.io/badge/Maturity%20Level-ML--0-red" alt="Maturity level-0">
+</p>
+<h1 align="center">PROTAC-Degradation-Predictor</h1>
+<p align="center">
+  A machine learning-based tool for predicting PROTAC protein degradation activity.
+</p>
+## 📚 Table of Contents
+- [Data Curation](#-data-curation)
+- [Installation](#-installation)
+- [Usage](#-usage)
+## 📝 Data Curation
+The code for data curation can be found in the Jupyter notebook [`data_curation.ipynb`](notebooks/data_curation.ipynb).
+## 🚀 Installation
+To install the package, open your terminal and run the following command:
 ```bash
 pip install .
 ```
+## 🎯 Usage
+After installing the package, you can use it as follows:
 ```python
 import protac_degradation_predictor as pdp
     e3_ligase,
     target_uniprot,
     cell_line,
+    device='cuda', # Default to 'cpu'
     proba_threshold=0.5, # Default value
 )
 print(f'The given PROTAC is: {"active" if active_protac else "inactive"}')
 ```
+This example demonstrates how to predict the activity of a PROTAC molecule. The `is_protac_active` function takes the SMILES string of the PROTAC, the E3 ligase, the UniProt ID of the target protein, and the cell line as inputs. It returns whether the PROTAC is active or not.
+## 📈 Training
+The code for training the model can be found in the file [`run_experiments.py`](src/run_experiments.py).
+## 📜 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

notebooks/best_fingerprint_search.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

notebooks/cell_type_embedding.ipynb CHANGED Viewed

@@ -869,6 +869,13 @@
     "unique_columns_ranking"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 12,
@@ -989,6 +996,47 @@
     "    pickle.dump(cell2description, f)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1005,7 +1053,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
    "metadata": {},
    "outputs": [],
    "source": [

     "unique_columns_ranking"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "genome ancestry, karyotypic information, senescence, biotechnology, virology, caution, donor information, sequence variation, characteristics, transfected with, monoclonal antibody target, HLA typing, knockout cell, microsatellite instability, hierarchy (HI), breed/subspecies, derived from site, population, group, monoclonal antibody isotype, cell type, transformant, selected for resistance to, category (CA)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 12,
     "    pickle.dump(cell2description, f)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\\begin{figure*}[t!]\n",
+    "    \\centering\n",
+    "    \\begin{subfigure}{0.5\\textwidth}\n",
+    "        \\centering\n",
+    "        \\includegraphics[width=0.99\\columnwidth]{plots/pytorch_performance_Accuracy.pdf}\n",
+    "        \\caption{}\n",
+    "        \\label{fig:pytorch_accuracy}\n",
+    "    \\end{subfigure}%\n",
+    "    \\begin{subfigure}{0.5\\textwidth}\n",
+    "        \\centering\n",
+    "        \\includegraphics[width=0.99\\columnwidth]{plots/pytorch_performance_ROC AUC.pdf}\n",
+    "        \\caption{}\n",
+    "        \\label{fig:pytorch_roc_auc}\n",
+    "    \\end{subfigure}\\\\%\n",
+    "    \\begin{subfigure}{0.5\\textwidth}\n",
+    "        \\centering\n",
+    "        \\includegraphics[width=0.99\\columnwidth]{plots/pytorch_performance_F1 Score.pdf}\n",
+    "        \\caption{}\n",
+    "        \\label{fig:pytorch_f1_score}\n",
+    "    \\end{subfigure}%\n",
+    "    \\begin{subfigure}{0.5\\textwidth}\n",
+    "        \\centering\n",
+    "        \\includegraphics[width=0.99\\columnwidth]{plots/pytorch_performance_Precision.pdf}\n",
+    "        \\caption{}\n",
+    "        \\label{fig:pytorch_precision}\n",
+    "    \\end{subfigure}\\\\%\n",
+    "    \\begin{subfigure}{0.5\\textwidth}\n",
+    "        \\centering\n",
+    "        \\includegraphics[width=0.99\\columnwidth]{plots/pytorch_performance_Recall.pdf}\n",
+    "        \\caption{}\n",
+    "        \\label{fig:pytorch_recall}\n",
+    "    \\end{subfigure}%\n",
+    "    \\caption{Performance metrics of the proposed deep learning models. (a) ROC-AUC. (b) F1 score. (c) Precision. (d) Recall.}\n",
+    "    \\label{fig:pytorch_performance}\n",
+    "\\end{figure*}"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
   },
   {
    "cell_type": "code",
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [

notebooks/plot_experimental_results.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

notebooks/predict_unknown_protacs.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

protac_degradation_predictor/models/{best_model_n0_random-epoch=6-val_acc=0.74-val_roc_auc=0.796.ckpt → best_model_n0_random-epoch=13-val_acc=0.83-val_roc_auc=0.841-v1.ckpt} RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:52e12060ef0d21f4eb4d84570c708e8c8502a1fbe6ebcbae1d86044e23b77708
-size 101362856

 version https://git-lfs.github.com/spec/v1
+oid sha256:497045f4d5f3bcf859db7339f971d9a7c6c2881121fe8841f3c16d3d17f8c3fa
+size 5127967

protac_degradation_predictor/models/best_model_n1_random-epoch=8-test_acc=0.78-test_roc_auc=0.851.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd7c412c10651dda9c528d57057f630c167a724f86f3dbd933122991406e0563
+size 2565407

protac_degradation_predictor/models/best_model_n2_random-epoch=9-val_acc=0.80-val_roc_auc=0.841-v1.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a32c2eaa95d7f887912074426e092976a384c303b2e07cb05db2e8e5f1a48870
+size 5127967

protac_degradation_predictor/models/cv_model_random_fold0-epoch=12-val_acc=0.86-val_roc_auc=0.905.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aaec6c36b21650fdca607be7978f38b5ee07e71a9d747c14402806de540148b7
+size 2565087

protac_degradation_predictor/models/cv_model_random_fold1-epoch=16-val_acc=0.86-val_roc_auc=0.933.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:59334fe5d4013c9575b827ab820bb2dcb9ab64b597cd6571557e18f3cc9707fe
+size 2565407

protac_degradation_predictor/models/cv_model_random_fold2-epoch=16-val_acc=0.86-val_roc_auc=0.908.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe1ce77c1bf664a8ec6ee5a7a83c685b18fefc42ad0c3069dde2e88e77d0a07e
+size 5128095

protac_degradation_predictor/models/cv_model_random_fold3-epoch=15-val_acc=0.89-val_roc_auc=0.930.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3606040e5077e0b3d9e00fd3ab8c94decb20cb84fc59a9aad0f55f3620f36b8a
+size 2565087

protac_degradation_predictor/models/cv_model_random_fold4-epoch=15-val_acc=0.88-val_roc_auc=0.928.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e634ff4ebb89afeff3e31ff47d8215c8906e1133530630c0abcc938408dffc91
+size 2565215

protac_degradation_predictor/optuna_utils.py CHANGED Viewed

@@ -104,7 +104,7 @@ def get_majority_vote_metrics(
         'test_roc_auc': AUROC(task='binary')(test_preds, y).item(),
         'test_precision': Precision(task='binary')(test_preds, y).item(),
         'test_recall': Recall(task='binary')(test_preds, y).item(),
-        'test_f1': F1Score(task='binary')(test_preds, y).item(),
     }
     return majority_vote_metrics

         'test_roc_auc': AUROC(task='binary')(test_preds, y).item(),
         'test_precision': Precision(task='binary')(test_preds, y).item(),
         'test_recall': Recall(task='binary')(test_preds, y).item(),
+        'test_f1_score': F1Score(task='binary')(test_preds, y).item(),
     }
     return majority_vote_metrics

protac_degradation_predictor/protac_degradation_predictor.py CHANGED Viewed

@@ -1,6 +1,6 @@
 import pkg_resources
 import logging
-from typing import List
 from .pytorch_models import PROTAC_Model, load_model
 from .data_utils import (
@@ -20,12 +20,21 @@ def get_protac_active_proba(
         e3_ligase: str | List[str],
         target_uniprot: str | List[str],
         cell_line: str | List[str],
-        device: str = 'cpu',
-) -> bool:
-    model_filename = 'best_model_n0_random-epoch=6-val_acc=0.74-val_roc_auc=0.796.ckpt'
-    ckpt_path = pkg_resources.resource_stream(__name__, f'models/{model_filename}')
-    model = load_model(ckpt_path).to(device)
     protein2embedding = load_protein2embedding()
     cell2embedding = load_cell2embedding('data/cell2embedding.pkl')
@@ -60,32 +69,47 @@ def get_protac_active_proba(
         smiles_emb = [get_fingerprint(protac_smiles)]
     # Convert to torch tensors
-    poi_emb = torch.tensor(poi_emb).to(device)
-    e3_emb = torch.tensor(e3_emb).to(device)
-    cell_emb = torch.tensor(cell_emb).to(device)
-    smiles_emb = torch.tensor(smiles_emb).to(device)
-    pred = model(
-        poi_emb,
-        e3_emb,
-        cell_emb,
-        smiles_emb,
-        prescaled_embeddings=False, # Trigger automatic scaling
-    )
-    if isinstance(protac_smiles, list):
-        return sigmoid(pred).detach().numpy().flatten()
-    else:
-        return sigmoid(pred).item()
 def is_protac_active(
-        protac_smiles: str,
-        e3_ligase: str,
-        target_uniprot: str,
-        cell_line: str,
         device: str = 'cpu',
         proba_threshold: float = 0.5,
 ) -> bool:
     """ Predict whether a PROTAC is active or not.
@@ -106,5 +130,9 @@ def is_protac_active(
         target_uniprot,
         cell_line,
         device,
     )
-    return pred > proba_threshold

 import pkg_resources
 import logging
+from typing import List, Literal, Dict
 from .pytorch_models import PROTAC_Model, load_model
 from .data_utils import (
         e3_ligase: str | List[str],
         target_uniprot: str | List[str],
         cell_line: str | List[str],
+        device: Literal['cpu', 'cuda'] = 'cpu',
+        use_models_from_cv: bool = False,
+) -> Dict[str, np.ndarray]:
+    """ Predict the probability of a PROTAC being active.
+    Args:
+        protac_smiles (str | List[str]): The SMILES of the PROTAC.
+        e3_ligase (str | List[str]): The Uniprot ID of the E3 ligase.
+        target_uniprot (str | List[str]): The Uniprot ID of the target protein.
+        cell_line (str | List[str]): The cell line identifier.
+        device (str): The device to run the model on.
+    Returns:
+        Dict[str, np.ndarray]: The predictions of the model.
+    """
     protein2embedding = load_protein2embedding()
     cell2embedding = load_cell2embedding('data/cell2embedding.pkl')
         smiles_emb = [get_fingerprint(protac_smiles)]
     # Convert to torch tensors
+    poi_emb = torch.tensor(np.array(poi_emb)).to(device)
+    e3_emb = torch.tensor(np.array(e3_emb)).to(device)
+    cell_emb = torch.tensor(np.array(cell_emb)).to(device)
+    smiles_emb = torch.tensor(np.array(smiles_emb)).float().to(device)
+    models = {}
+    model_to_load = 'best_model' if not use_models_from_cv else 'cv_model'
+    # Load all models in pkg_resources that start with 'model_to_load'
+    for model_filename in pkg_resources.resource_listdir(__name__, 'models'):
+        if model_filename.startswith(model_to_load):
+            ckpt_path = pkg_resources.resource_stream(__name__, f'models/{model_filename}')
+            models[ckpt_path] = load_model(ckpt_path).to(device)
+    # Average the predictions of all models
+    preds = {}
+    for ckpt_path, model in models.items():
+        pred = model(
+            poi_emb,
+            e3_emb,
+            cell_emb,
+            smiles_emb,
+            prescaled_embeddings=False, # Normalization performed by the model
+        )
+        preds[ckpt_path] = sigmoid(pred).detach().numpy().flatten()
+    axis = 1 if isinstance(protac_smiles, list) else None
+    return {
+        'preds': np.array(list(preds.values())),
+        'mean': np.mean(list(preds.values()), axis=axis),
+        'majority_vote': np.mean(list(preds.values()), axis=axis) > 0.5,
+    }
 def is_protac_active(
+        protac_smiles: str | List[str],
+        e3_ligase: str | List[str],
+        target_uniprot: str | List[str],
+        cell_line: str | List[str],
         device: str = 'cpu',
         proba_threshold: float = 0.5,
+        use_majority_vote: bool = False,
+        use_models_from_cv: bool = False,
 ) -> bool:
     """ Predict whether a PROTAC is active or not.
         target_uniprot,
         cell_line,
         device,
+        use_models_from_cv,
     )
+    if use_majority_vote:
+        return pred['majority_vote']
+    else:
+        return pred['mean'] > proba_threshold

protac_degradation_predictor/pytorch_models.py CHANGED Viewed

@@ -53,14 +53,6 @@ class PROTAC_Predictor(nn.Module):
             disabled_embeddings (list): List of disabled embeddings. Can be 'poi', 'e3', 'cell', 'smiles'
         """
         super().__init__()
-        self.poi_emb_dim = poi_emb_dim
-        self.e3_emb_dim = e3_emb_dim
-        self.cell_emb_dim = cell_emb_dim
-        self.smiles_emb_dim = smiles_emb_dim
-        self.hidden_dim = hidden_dim
-        self.join_embeddings = join_embeddings
-        self.use_batch_norm = use_batch_norm
-        self.disabled_embeddings = disabled_embeddings
         # Set our init args as class attributes
         self.__dict__.update(locals())
@@ -126,12 +118,24 @@ class PROTAC_Predictor(nn.Module):
         else:
             if 'poi' not in self.disabled_embeddings:
                 embeddings.append(self.poi_fc(poi_emb))
             if 'e3' not in self.disabled_embeddings:
                 embeddings.append(self.e3_fc(e3_emb))
             if 'cell' not in self.disabled_embeddings:
                 embeddings.append(self.cell_fc(cell_emb))
             if 'smiles' not in self.disabled_embeddings:
                 embeddings.append(self.smiles_emb(smiles_emb))
             if self.join_embeddings == 'concat':
                 x = torch.cat(embeddings, dim=1)
             elif self.join_embeddings == 'sum':
@@ -140,6 +144,8 @@ class PROTAC_Predictor(nn.Module):
                     x = torch.sum(embeddings, dim=1)
                 else:
                     x = embeddings[0]
         x = F.relu(self.fc1(x))
         x = self.bnorm(x) if self.use_batch_norm else self.self.dropout(x)
         x = self.fc3(x)
@@ -185,19 +191,6 @@ class PROTAC_Model(pl.LightningModule):
             apply_scaling (bool): Whether to apply scaling to the embeddings
         """
         super().__init__()
-        self.poi_emb_dim = poi_emb_dim
-        self.e3_emb_dim = e3_emb_dim
-        self.cell_emb_dim = cell_emb_dim
-        self.smiles_emb_dim = smiles_emb_dim
-        self.hidden_dim = hidden_dim
-        self.batch_size = batch_size
-        self.learning_rate = learning_rate
-        self.join_embeddings = join_embeddings
-        self.train_dataset = train_dataset
-        self.val_dataset = val_dataset
-        self.test_dataset = test_dataset
-        self.disabled_embeddings = disabled_embeddings
-        self.apply_scaling = apply_scaling
         # Set our init args as class attributes
         self.__dict__.update(locals())  # Add arguments as attributes
         # Save the arguments passed to init
@@ -265,6 +258,7 @@ class PROTAC_Model(pl.LightningModule):
             self,
             tensor: torch.Tensor,
             scaler: StandardScaler,
     ) -> torch.Tensor:
         """Scale a tensor using a scaler. This is done to avoid using numpy
         arrays (and stay on the same device).
@@ -280,7 +274,7 @@ class PROTAC_Model(pl.LightningModule):
         if scaler.with_mean:
             tensor -= torch.tensor(scaler.mean_, dtype=tensor.dtype, device=tensor.device)
         if scaler.with_std:
-            tensor /= torch.tensor(scaler.scale_, dtype=tensor.dtype, device=tensor.device)
         return tensor
     def forward(self, poi_emb, e3_emb, cell_emb, smiles_emb, prescaled_embeddings=True):
@@ -300,6 +294,14 @@ class PROTAC_Model(pl.LightningModule):
                     e3_emb = self.scale_tensor(e3_emb, self.scalers['E3 Ligase Uniprot'])
                     cell_emb = self.scale_tensor(cell_emb, self.scalers['Cell Line Identifier'])
                     smiles_emb = self.scale_tensor(smiles_emb, self.scalers['Smiles'])
         return self.model(poi_emb, e3_emb, cell_emb, smiles_emb)
     def step(self, batch, batch_idx, stage):
@@ -624,5 +626,4 @@ def load_model(
     # with other datasets...
     # if model.apply_scaling:
     #     model.apply_scalers()
-    model.eval()
-    return model

             disabled_embeddings (list): List of disabled embeddings. Can be 'poi', 'e3', 'cell', 'smiles'
         """
         super().__init__()
         # Set our init args as class attributes
         self.__dict__.update(locals())
         else:
             if 'poi' not in self.disabled_embeddings:
                 embeddings.append(self.poi_fc(poi_emb))
+                if torch.isnan(embeddings[-1]).any():
+                    raise ValueError("NaN values found in POI embeddings.")
             if 'e3' not in self.disabled_embeddings:
                 embeddings.append(self.e3_fc(e3_emb))
+                if torch.isnan(embeddings[-1]).any():
+                    raise ValueError("NaN values found in E3 embeddings.")
             if 'cell' not in self.disabled_embeddings:
                 embeddings.append(self.cell_fc(cell_emb))
+                if torch.isnan(embeddings[-1]).any():
+                    raise ValueError("NaN values found in cell embeddings.")
             if 'smiles' not in self.disabled_embeddings:
                 embeddings.append(self.smiles_emb(smiles_emb))
+                if torch.isnan(embeddings[-1]).any():
+                    raise ValueError("NaN values found in SMILES embeddings.")
             if self.join_embeddings == 'concat':
                 x = torch.cat(embeddings, dim=1)
             elif self.join_embeddings == 'sum':
                     x = torch.sum(embeddings, dim=1)
                 else:
                     x = embeddings[0]
+        if torch.isnan(x).any():
+            raise ValueError("NaN values found in sum of softmax-ed embeddings.")
         x = F.relu(self.fc1(x))
         x = self.bnorm(x) if self.use_batch_norm else self.self.dropout(x)
         x = self.fc3(x)
             apply_scaling (bool): Whether to apply scaling to the embeddings
         """
         super().__init__()
         # Set our init args as class attributes
         self.__dict__.update(locals())  # Add arguments as attributes
         # Save the arguments passed to init
             self,
             tensor: torch.Tensor,
             scaler: StandardScaler,
+            alpha: float = 1e-10,
     ) -> torch.Tensor:
         """Scale a tensor using a scaler. This is done to avoid using numpy
         arrays (and stay on the same device).
         if scaler.with_mean:
             tensor -= torch.tensor(scaler.mean_, dtype=tensor.dtype, device=tensor.device)
         if scaler.with_std:
+            tensor /= torch.tensor(scaler.scale_, dtype=tensor.dtype, device=tensor.device) + alpha
         return tensor
     def forward(self, poi_emb, e3_emb, cell_emb, smiles_emb, prescaled_embeddings=True):
                     e3_emb = self.scale_tensor(e3_emb, self.scalers['E3 Ligase Uniprot'])
                     cell_emb = self.scale_tensor(cell_emb, self.scalers['Cell Line Identifier'])
                     smiles_emb = self.scale_tensor(smiles_emb, self.scalers['Smiles'])
+        if torch.isnan(poi_emb).any():
+            raise ValueError("NaN values found in POI embeddings.")
+        if torch.isnan(e3_emb).any():
+            raise ValueError("NaN values found in E3 embeddings.")
+        if torch.isnan(cell_emb).any():
+            raise ValueError("NaN values found in cell embeddings.")
+        if torch.isnan(smiles_emb).any():
+            raise ValueError("NaN values found in SMILES embeddings.")
         return self.model(poi_emb, e3_emb, cell_emb, smiles_emb)
     def step(self, batch, batch_idx, stage):
     # with other datasets...
     # if model.apply_scaling:
     #     model.apply_scalers()
+    return model.eval()

src/plot_experiment_results.py CHANGED Viewed

@@ -331,7 +331,6 @@ def main():
         ]),
     }
     for split_type in ['random', 'tanimoto', 'uniprot']:
         split_metrics = []
         for i in range(n_models_for_test):

         ]),
     }
     for split_type in ['random', 'tanimoto', 'uniprot']:
         split_metrics = []
         for i in range(n_models_for_test):

src/run_experiments.py CHANGED Viewed

@@ -8,7 +8,6 @@ from typing import Literal
 sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
 import protac_degradation_predictor as pdp
-from protac_degradation_predictor.optuna_utils import get_dataframe_stats
 import pytorch_lightning as pl
 from rdkit import Chem

 sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
 import protac_degradation_predictor as pdp
 import pytorch_lightning as pl
 from rdkit import Chem