CMSSP: A Contrastive Mass Spectra-Structure Pre-Training Model for Metabolite Identification
A pivotal challenge in metabolite research is the structural annotation of metabolites from tandem mass spectrometry (MS/MS) data. The integration of artificial intelligence (AI) has revolutionized the interpretation of MS data, facilitating the identification of elusive metabolites within the metabolomics landscape. Innovative methodologies are primarily focusing on transforming MS/MS spectra or molecular structures into a unified modality to enable similarity-based comparison and interpretation. Here we present CMSSP, a novel Contrastive Mass Spectra-Structure Pre-training framework designed for the annotation of metabolites. The primary objective of CMSSP is to establish a representation space that facilitates direct comparison between MS/MS spectra and molecular structures, transcending the limitations of distinct modalities.
Requirements
- Python >= 3.7.11
- PyTorch >= 1.10.0
- RDKit >= 2022.09.5
- scikit-learn >= 1.3.0
Usage
We provide two major scripts:
- train.py trains the model.
- predict.py predicts the given file with MS/MS and candidate structures.
(i) training using the command:
python train.py cfg.json
(i) predicting using the command:
python predict.py data_to_be_predicted.json model.pth
Publication
This work has been published in the journal Analytical Chemistry: CMSSP: A Contrastive Mass Spectra-Structure Pretraining Model for Metabolite Identification