--- license: apache-2.0 datasets: - jxie/guacamol - AdrianM0/MUV library_name: transformers --- ## Model Details We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning. ### Enumeration-aware Molecular Transformers Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models. #### a. Molecular Domain Adaptation (Contrastive Encoder-based) ##### i. Architecture ![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg) ##### ii. Contrastive Learning Screenshot 2023-04-22 at 11 54 23 AM #### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder) Screenshot 2023-04-22 at 11 43 06 AM ### Pretraining steps for this model: - Pretrain BERT model with Multi task regression on physicochemical properties on Guacamol dataset - Domain adaptation on MUV dataset with Constrastive Learning, Masked Language Modeling Fore more details please see our [github repository](https://github.com/uds-lsv/enumeration-aware-molecule-transformers).