CMSSP: A Contrastive Mass Spectra-Structure Pre-Training Model for Metabolite Identification

A pivotal challenge in metabolite research is the structural annotation of metabolites from tandem mass spectrometry (MS/MS) data. The integration of artificial intelligence (AI) has revolutionized the interpretation of MS data, facilitating the identification of elusive metabolites within the metabolomics landscape. Innovative methodologies are primarily focusing on transforming MS/MS spectra or molecular structures into a unified modality to enable similarity-based comparison and interpretation. Here we present CMSSP, a novel Contrastive Mass Spectra-Structure Pre-training framework designed for the annotation of metabolites. The primary objective of CMSSP is to establish a representation space that facilitates direct comparison between MS/MS spectra and molecular structures, transcending the limitations of distinct modalities.

Requirements

  • Python >= 3.7.11
  • PyTorch >= 1.10.0
  • RDKit >= 2022.09.5
  • scikit-learn >= 1.3.0

Usage

We provide two major scripts:

  • train.py trains the model.
  • predict.py predicts the given file with MS/MS and candidate structures.

(i) training using the command:

python train.py cfg.json

(i) predicting using the command:

python predict.py data_to_be_predicted.json model.pth

Publication

This work has been published in the journal Analytical Chemistry: CMSSP: A Contrastive Mass Spectra-Structure Pretraining Model for Metabolite Identification

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.