File size: 1,994 Bytes
bc3bc84
 
 
 
 
 
 
 
 
 
 
04be52f
 
 
 
bc3bc84
19f2855
bc3bc84
 
 
 
04be52f
bc3bc84
 
 
 
 
 
 
205606d
 
 
 
 
 
 
 
 
 
 
04be52f
 
bc3bc84
 
 
 
 
 
 
 
 
04be52f
bc3bc84
 
 
 
 
 
 
 
 
 
 
04be52f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
language:
- en
metrics:
- accuracy
- AUC ROC
- precision
- recall
tags:
- biology
- chemistry
- therapeutic science
- drug design
- drug development
- therapeutics
library_name: tdc
license: mit
---

## Dataset description

An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labeled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature.

## Task description
Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM).

## Dataset statistics
Total: 13445; Train_val: 12620; Test: 825

## Pre-requisites
Install the following packages
```
pip install PyTDC
pip install DeepPurpose
pip install git+https://github.com/bp-kelley/descriptastorus
pip install dgl torch torchvision
```
You can also reference the colab notebook [here](https://colab.research.google.com/drive/1CL92SOCBS-eYDL99w8tjSNIG_ySXzMrG?usp=sharing)


## Dataset split
Random split with 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

```python
from tdc.single_pred import Tox
data = Tox(name = 'herg_karim')
```

## Model description
AttentiveFP is a Graph Attention Network-based molecular representation learning method. The model is tuned with 100 runs using the Ax platform.

To load the pre-trained model, type

```python
from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("hERG_Karim-AttentiveFP")
# load deeppurpose model from this repo
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])
```

## References
* Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/tox. 
* Karim, A., et al. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 13, 60 (2021). https://doi.org/10.1186/s13321-021-00541-z