kexinhuang commited on
Commit
f3609dc
1 Parent(s): 3c27813

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ metrics:
5
+ - accuracy
6
+ - AUC ROC
7
+ - precision
8
+ - recall
9
+ tags:
10
+ - biology
11
+ - chemistry
12
+ library_name: tdc
13
+ license: bsd-2-clause
14
+ ---
15
+
16
+ ## Dataset description
17
+
18
+ An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labelled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature.
19
+
20
+ ## Task description
21
+ Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM).
22
+
23
+ ## Dataset statistics
24
+ Total: 13445; Train_val: 12620; Test: 825
25
+
26
+ ## Dataset split:
27
+ Random split on 70% training, 10% validation, and 20% testing
28
+
29
+ To load the dataset in TDC, type
30
+
31
+ ```python
32
+ from tdc.single_pred import Tox
33
+ data = Tox(name = 'herg_karim')
34
+ ```
35
+
36
+ ## Model description
37
+ Morgan chemical fingerprint with an MLP decoder. Model is tuned with 100 runs using Ax platform.
38
+
39
+ To load the pre-trained model, type
40
+
41
+ ```python
42
+ from tdc import tdc_hf_interface
43
+ tdc_hf = tdc_hf_interface("hERG_Karim-Morgan")
44
+ # load deeppurpose model from this repo
45
+ dp_model = tdc_hf.load_deeppurpose('./data')
46
+ tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])
47
+ ```
48
+
49
+ ## References:
50
+ [1] Karim, A., et al. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 13, 60 (2021). https://doi.org/10.1186/s13321-021-00541-z