Upload 4 files

Files changed (4) hide show

MolE-XGBoost-08.03.2024_14.20.pkl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:e66874f9019beab0eb02378893c064d63d34df3482f8f6f0495d144597e972d0
+size 10210090

README.md CHANGED Viewed

@@ -1,3 +1,22 @@
----
-license: mit
----

+# MolE - Antimicrobial Prediction
+This model uses MolE's pre-trained representation to train XGBoost models to predict the antimicrobial activity of compounds based on their molecular structure.
+## Files:
+- `model.pth` - the pre-trained representation model's weights
+- `config.yaml` - model configuration
+- `MolE-XGBoost-08.03.2024_14.20.pkl` - pretrained XGBoost model
+## Usage
+Not ready yet.
+## Publication
+For more information about MolE, and how we use it to predict antimicrobial activity, you can check out the paper in Nature Communications:
+[**Pre-trained molecular representations enable antimicrobial discovery**](https://www.nature.com/articles/s41467-025-58804-4)
+## GitHub
+The code is available here:
+[**Link to GitHub repo**](https://github.com/rolayoalarcon/mole_antimicrobial_potential)

config.yaml ADDED Viewed

+batch_size: 1000                         # batch size
+warm_up: 10                             # warm-up epochs
+epochs: 1000                             # total number of epochs
+load_model: None                        # resume training
+eval_every_n_epochs: 1                  # validation frequency
+save_every_n_epochs: 5                  # automatic model saving frequecy
+fp16_precision: False                   # float precision 16 (i.e. True/False)
+init_lr: 0.0005                         # initial learning rate for Adam
+weight_decay: 1e-5                      # weight decay for Adam
+gpu: cuda:0                             # training GPU
+model_type: gin_concat                         # GNN backbone (i.e., gin/gcn)
+model:
+  num_layer: 5                          # number of graph conv layers
+  emb_dim: 200                          # embedding dimension in graph conv layers
+  feat_dim: 8000                          # output feature dimention
+  drop_ratio: 0.0                         # dropout ratio
+  pool: add                            # readout pooling (i.e., mean/max/add)
+dataset:
+  num_workers: 50                       # dataloader number of workers
+  valid_size: 0.1                      # ratio of validation data
+  data_path: data/pubchem_data/pubchem_100k_random.txt # path of pre-training data
+loss:
+  l: 0.0001 # Lambda parameter

model.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d324644c5f43e7be6734a9cd7a7966f975bfcc113610c13be897d11674defd8
+size 803807667