Source code, training scripts, and inference utilities for this model: github.com/NVIDIA-BioNeMo/KERMT (v2.0 branch / v2.0.0 release tag)

Model Overview

Description:

Contrastive KERMT (Kinetic GROVER Multi-Task) is a graph-transformer foundation model pretrained to learn chemically meaningful molecular representations for downstream ADMET (absorption, distribution, metabolism, excretion, toxicity) property prediction in drug discovery. The model encodes a 2D molecular graph into a latent representation under a single joint probabilistic objective that combines SMILES reconstruction, in-batch contrastive discrimination, and chemistry-specific self-supervision (atom-context, bond-context, and functional group prediction), all formulated as unit-weighted log-probability factors. The released checkpoint was pretrained for 100 epochs on a corpus combining an 11M-molecule ZINC15+ChEMBL base pool (following the pretraining-data protocol of Rong et al. 2020) with Biogen ADMET, ExpansionRX, and ChEMBL-MT (~125K additional molecules), and is intended as a starting point for downstream multi-task ADMET fine-tuning. Contrastive KERMT was developed by NVIDIA as part of the KERMT v2.0 release. This model is ready for commercial or non-commercial use.

License/Terms of Use:

Copyright © 2026, NVIDIA Corporation. All rights reserved.
The source code is made available under Apache License, Version 2.0. See LICENSE in the source repository at https://github.com/NVIDIA-BioNeMo/KERMT.
The model weights are made available under the NVIDIA Open Model License.

Deployment Geography:

Global

Use Case:

Computational chemistry and machine-learning researchers in drug discovery — particularly those working on ADMET / Drug Metabolism and Pharmacokinetics (DMPK) prediction — who need a pretrained molecular graph encoder that can be fine-tuned on multi-endpoint ADMET datasets, used as a feature extractor for property-prediction pipelines, or studied as a baseline in molecular-representation-learning research. The released checkpoint is a pretrained backbone; users are expected to fine-tune it on their own labeled datasets for specific ADMET endpoints before using predictions in downstream workflows.

Release Date:

NGC 06/10/2026 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/resources/kermt-contrastive
Hugging Face 06/10/2026 via https://huggingface.co/nvidia/NV-KERMT-70M-v2

References(s):

Adrian, M., Chung, Y., Boyd, K., Paliwal, S., Veccham, S.P., Cheng, A.C. Multitask finetuning and acceleration of chemical pretrained models for small molecule drug property prediction. arXiv:2510.12719 (2025). https://arxiv.org/abs/2510.12719 — KERMT (the v1 baseline this work extends).
Rong, Y. et al. Self-Supervised Graph Transformer on Large-Scale Molecular Data. NeurIPS 33, 12559–12571 (2020). https://papers.nips.cc/paper/2020/hash/3fe230348e9a12c13120749e3f9fa4cd-Abstract.html — GROVER, the underlying graph-transformer architecture.
Sterling, T., Irwin, J. J. ZINC 15 – Ligand Discovery for Everyone. J. Chem. Inf. Model. 55(11), 2324–2337 (2015). DOI: 10.1021/acs.jcim.5b00559 — ZINC15 base corpus.
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Research 47(D1), D930–D940 (2019). — ChEMBL base corpus.
Fang, C., Wang, Y., Grater, R. et al. Prospective Validation of Machine Learning Algorithms for ADMET Prediction. J. Chem. Inf. Model. 63(11), 3263–3274 (2023). — Biogen ADMET dataset (in-domain augmentation + finetune benchmark).
Contrastive KERMT manuscript (in preparation; arXiv URL to be added on publication).

Model Architecture:

Architecture Type: Transformer (graph-transformer with local message passing + global self-attention)

Network Architecture: KERMT graph-transformer encoder (extension of GROVER) with a probabilistic latent head, an in-batch contrastive auxiliary variable, a SMILES-reconstruction transformer decoder, and chemistry-specific vocabulary prediction heads. Encoder: hidden size 800, 6 message-passing-plus-attention layers, 4 attention heads per layer, 1 multi-task (MT) block, PReLU activation, dropout 0.1. Decoder: 3 transformer layers, 8 attention heads, 512 hidden / latent dimension, FFN hidden 2048, rotary positional encoding (RoPE). Latent dimension 512.

This model was developed based on KERMT (Adrian et al. 2025, arXiv:2510.12719), in turn based on GROVER (Rong et al. 2020).

Number of model parameters: 7.06 × 10^7

Input(s):

Input Type(s): Text (SMILES string representing a 2D molecular structure)

Input Format(s): UTF-8 SMILES (Simplified Molecular Input Line Entry System)

Input Parameters: One-Dimensional (1D) text

Other Properties Related to Input: The input is a canonical SMILES string parseable by RDKit (an open-source cheminformatics toolkit); molecules are internally featurized into 2D atom-and-bond graphs prior to encoding. Recommended maximum sequence length for the SMILES decoder is 512 tokens (the value used at pretraining time); molecules whose canonical SMILES exceed this length should be truncated or omitted. Inputs are not text in the natural-language sense and are not subject to natural-language preprocessing (no tokenization in the human-language sense; characters are mapped via a chemistry-specific tokenizer matching the bundled SMILES vocabulary).

Output(s)

Output Type(s): Numerical tensors (molecular embeddings) and, when downstream task-specific heads are present, scalar ADMET property predictions. Optionally, generated SMILES strings via the pretraining-time SMILES decoder.

Output Format(s):

Molecular embeddings: float tensors of shape (batch_size, hidden_size=800) for atom-level and bond-level readouts; (batch_size, latent_dim=512) for the cMIM projected latent.
Property predictions (after finetune): float tensors of shape (batch_size, num_endpoints) — values are continuous regression outputs per ADMET endpoint.
Generated SMILES (pretrain-time decoder only): UTF-8 SMILES string.

Output Parameters: One-Dimensional (1D) embedding / prediction vectors.

Other Properties Related to Output: Embeddings are intended as inputs to downstream property-prediction heads, similarity computations, or visualization (PCA / t-SNE / UMAP). Predictions are statistical estimates derived from training data and should not be used as substitutes for experimental measurement in safety-critical drug development decisions.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

PyTorch 2.x (the released checkpoint is loadable via the KERMT codebase at https://github.com/NVIDIA-BioNeMo/KERMT)

Supported Hardware Microarchitecture Compatibility:
NVIDIA GPU with compute capability 7.0 (Volta) or newer is recommended; at least 32 GB GPU vRAM is recommended for pretraining and fine-tuning workloads (inference uses less). The following microarchitecture families are supported:

NVIDIA Ampere (e.g., A100, A40, A10)
NVIDIA Blackwell
NVIDIA Hopper (e.g., H100)
NVIDIA Lovelace (e.g., L4, L40)
NVIDIA Turing (e.g., T4)
NVIDIA Volta (e.g., V100)

Supported Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

kermt-contrastive v2.0 — Pretrained Contrastive KERMT checkpoint trained for 100 epochs on the pooled corpus described under "Training Dataset" below (11M ZINC15+ChEMBL base pool + Biogen ADMET + ExpansionRX + ChEMBL-MT). Inference-only release (training-time optimizer state stripped).

The full KERMT v2.0 release includes (a) source code in the public repository at https://github.com/NVIDIA-BioNeMo/KERMT (Apache 2.0), (b) the pretrained checkpoint described here, and (c) the bundled pretraining vocabulary files (atom vocab JSON, bond vocab JSON, SMILES vocab pickle) required to reproduce the model's tokenization.

Training, Testing, and Evaluation Datasets:

Public Datasets

ZINC15 — Free database of commercially-available compounds for virtual screening, assembled by the Irwin & Shoichet lab at UCSF. URL: https://zinc15.docking.org/
ChEMBL — Manually-curated database of bioactive molecules with drug-like properties from EBI. URL: https://www.ebi.ac.uk/chembl/
Biogen ADMET — Public ADMET dataset released alongside Fang et al. 2023 (J. Chem. Inf. Model.) — 3,521 unique molecules across 6 endpoints (HLM-CLint, RLM-CLint, MDR1-MDCK efflux ratio, solubility@pH 6.8 — the 4 endpoints we use — plus PPB-human and PPB-rat). URL: https://github.com/molecularinformatics/Computational-ADME ; mirror at https://polarishub.io/datasets/biogen/adme-fang-v1
ExpansionRX — Public ADMET dataset released by ExpansionRX in January 2026 — 7.6K molecules with 9 endpoints (LogD, kinetic solubility, HLM CLint, mouse LM CLint, mouse PPB, mouse brain PB, mouse Gastrocnemius Muscle Binding, Caco-2 efflux ratio, Caco-2 Papp A>B). URL: https://huggingface.co/datasets/openadmet/openadmet-expansionrx-challenge-data
ChEMBL-MT — Multi-task subset of ChEMBL curated by Adrian et al. 2025 — 114K molecules with 25 ADMET endpoints (24 ADME + 1 toxicity endpoint, hERG inhibition). Data URL: https://figshare.com/articles/dataset/Datasets_for_Multitask_finetuning_and_acceleration_of_chemical_pretrained_models_for_small_molecule_drug_property_prediction_/30350548 ; paper: https://arxiv.org/abs/2510.12719

Private Datasets

Not Applicable

Training Dataset:

Data Modality:

Other: Molecular SMILES strings (1D text representation of 2D molecular structures)

Training Data Size:

Less than 100 Million Datapoints. Pretraining corpus contains approximately 11.1 million unique canonical SMILES strings after pooling and RDKit-based deduplication: an 11M ZINC15+ChEMBL base pool (per Rong et al. 2020) plus Biogen ADMET (~~3.5K), ExpansionRX (~~7.6K), and ChEMBL-MT (~114K).

** Non-Audio, Image, Text Training Data Size: ~11.1 × 10^6 molecules (SMILES strings); total on-disk text size ≈ a few hundred MB depending on serialization.

** Data Collection Method by dataset: Hybrid: Automated, Manually-Collected

ZINC15: Automated (database curation from public vendor catalogs by Irwin & Shoichet lab at UCSF)
ChEMBL: Manually-Collected (manually curated by EBI from primary literature and direct depositions)
Biogen ADMET: Manually-Collected (experimental ADMET assay measurements from Biogen's drug-discovery program)
ExpansionRX: Manually-Collected (experimental ADMET assay measurements from ExpansionRX's drug-discovery program)
ChEMBL-MT: Hybrid (Manually-Collected, Automated) — manually-collected ChEMBL assay data, automatically aggregated into multi-task splits by Adrian et al.

** Labeling Method by dataset: Not Applicable — pretraining is unsupervised and uses no human-provided labels. Targets (atom-context, bond-context, functional groups) are computed on the fly from canonical SMILES using deterministic RDKit-based rules; SMILES-reconstruction targets are the input SMILES themselves.

Properties: ~11M molecular SMILES strings; the data are text representations of 2D chemical structures and contain no personal data, copyrighted natural-language content, or human-language linguistic content. All molecules are deduplicated by canonical SMILES. The training pool is scaffold-balanced via Bemis-Murcko scaffolds for the train/val split.

Testing Dataset:

Pretraining is fully unsupervised and uses no dedicated testing dataset (no held-out test split at the pretraining stage). Model quality is assessed downstream via the fine-tuning Evaluation Datasets below.

Data Collection Method: Not Applicable
Labeling Method: Not Applicable
Properties: Not Applicable.

Evaluation Dataset:

Benchmark Score:

Downstream evaluation is performed by fine-tuning the released pretrained checkpoint on three independent ADMET benchmarks and reporting Mean Absolute Error (MAE), Pearson correlation coefficient (r), and Spearman correlation coefficient (ρ) per endpoint, averaged over multiple random seeds:

Biogen — 3.5K molecules, 4 endpoints, Bemis-Murcko scaffold split, 5 seeds.
ExpansionRX — 7.6K molecules, 9 endpoints, temporal split, 5 seeds.
ChEMBL-MT — 114K molecules, 25 endpoints, Taylor-Butina cluster splits, 2 folds × 2 seeds.

Data Collection Method by dataset: Hybrid: Manually-Collected, Automated

Biogen: Manually-Collected (experimental ADMET measurements)
ExpansionRX: Manually-Collected (experimental ADMET measurements)
ChEMBL-MT: Hybrid (Manually-Collected, Automated)

Labeling Method by dataset:

Biogen: Automatic/Sensors (experimental assay measurements)
ExpansionRX: Automatic/Sensors (experimental assay measurements)
ChEMBL-MT: Automatic/Sensors (experimental assay measurements aggregated from primary sources)

Properties: Continuous-valued ADMET endpoint measurements (intrinsic clearance, permeability, solubility, plasma protein binding, hERG inhibition, etc.) from in vitro and in vivo assays. Data are scalar regression targets per molecule; no images, video, or natural-language content. ChEMBL-MT contributes the toxicity endpoint (hERG inhibition); the ADME endpoints come from all three benchmarked datasets.

Inference:

Acceleration Engine: PyTorch (the released checkpoint is loadable via the KERMT codebase).
Test Hardware:

Minimum hardware requirement: NVIDIA GPU with compute capability 7.0 (Volta) or newer; at least 32 GB GPU vRAM recommended for pretraining and fine-tuning workloads.
Validated during development on NVIDIA Ampere A100 (pretraining + downstream) and NVIDIA Lovelace L4 (downstream).

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.

Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Graph Machine Learning

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for nvidia/NV-KERMT-70M-v2

Multitask finetuning and acceleration of chemical pretrained models for small molecule drug property prediction

Paper • 2510.12719 • Published Oct 14, 2025