Edit model card

Model Details

We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.

Enumeration-aware Molecular Transformers

Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.

a. Molecular Domain Adaptation (Contrastive Encoder-based)

i. Architecture

smole bert drawio

ii. Contrastive Learning
Screenshot 2023-04-22 at 11 54 23 AM

b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)

Screenshot 2023-04-22 at 11 43 06 AM

Pretraining steps for this model:

  • Pretrain BERT model with Multi task regression on physicochemical properties on Guacamol dataset
  • Domain adaptation on MUV dataset with Constrastive Learning, Masked Language Modeling

Fore more details please see our github repository.

Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train UdS-LSV/simcse-smole-bert-muv-mlm