--- language: - en metrics: - accuracy - AUC ROC - precision - recall tags: - biology - chemistry - therapeutic science - drug design - drug development - therapeutics library_name: tdc license: bsd-2-clause --- ## Dataset description An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labeled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature. ## Task description Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM). ## Dataset statistics Total: 13445; Train_val: 12620; Test: 825 ## Pre-requisites Install the following packages ``` pip install PyTDC pip install DeepPurpose pip install git+https://github.com/bp-kelley/descriptastorus pip install dgl torch torchvision ``` You can also reference the colab notebook [here](https://colab.research.google.com/drive/1CL92SOCBS-eYDL99w8tjSNIG_ySXzMrG?usp=sharing) ## Dataset split Random split with 70% training, 10% validation, and 20% testing To load the dataset in TDC, type ```python from tdc.single_pred import Tox data = Tox(name = 'herg_karim') ``` ## Model description CNN is applying Convolutional Neural Network on SMILES string fingerprint. Model is tuned with 100 runs using Ax platform. To load the pre-trained model, type ```python from tdc import tdc_hf_interface tdc_hf = tdc_hf_interface("hERG_Karim-CNN") # load deeppurpose model from this repo dp_model = tdc_hf.load_deeppurpose('./data') tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1']) ``` ## References * Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/tox * Karim, A., et al. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 13, 60 (2021). https://doi.org/10.1186/s13321-021-00541-z