|
--- |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- AUC ROC |
|
- precision |
|
- recall |
|
tags: |
|
- biology |
|
- chemistry |
|
- therapeutic science |
|
- drug design |
|
- drug development |
|
- therapeutics |
|
library_name: tdc |
|
license: bsd-2-clause |
|
--- |
|
|
|
## Dataset description |
|
|
|
An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labeled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature. |
|
|
|
## Task description |
|
Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM). |
|
|
|
## Dataset statistics |
|
Total: 13445; Train_val: 12620; Test: 825 |
|
|
|
## Pre-requisites |
|
Install the following packages |
|
``` |
|
pip install PyTDC |
|
pip install DeepPurpose |
|
pip install git+https://github.com/bp-kelley/descriptastorus |
|
pip install dgl torch torchvision |
|
``` |
|
You can also reference the colab notebook [here](https://colab.research.google.com/drive/1CL92SOCBS-eYDL99w8tjSNIG_ySXzMrG?usp=sharing) |
|
|
|
|
|
## Dataset split |
|
Random split on 70% training, 10% validation, and 20% testing |
|
|
|
To load the dataset in TDC, type |
|
|
|
```python |
|
from tdc.single_pred import Tox |
|
data = Tox(name = 'herg_karim') |
|
``` |
|
|
|
## Model description |
|
Morgan chemical fingerprint with an MLP decoder. The model is tuned with 100 runs using the Ax platform. |
|
|
|
To load the pre-trained model, type |
|
|
|
```python |
|
from tdc import tdc_hf_interface |
|
tdc_hf = tdc_hf_interface("hERG_Karim-Morgan") |
|
# load deeppurpose model from this repo |
|
dp_model = tdc_hf.load_deeppurpose('./data') |
|
tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1']) |
|
``` |
|
|
|
## References |
|
* Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/tox |
|
* Karim, A., et al. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 13, 60 (2021). https://doi.org/10.1186/s13321-021-00541-z |
|
|