dtSFM-v3 — Drug–Target Specificity Foundation Model (production encoder + generative decoder)

Paper: A drug–target specificity foundation model for off-target prediction, repurposing, and generative design · doi: 10.64898/2026.06.08.730844 Code: github.com/Reddy-BIIE-ETHZ/dtSFM All SFM models: huggingface.co/SFM-BIIE-ETHZ

This is the production dtSFM (v3): a full-scale cross-attention encoder paired with a cross-attentive autoregressive decoder. The smaller encoder-only prototype from the Vibe Coding SFMs paper lives separately at SFM-BIIE-ETHZ/dtSFM_VC-SFM.


What it does

dtSFM maps a (drug SMILES, protein sequence) pair to a binding-compatibility score, and generates novel target-conditioned drug-like molecules — all from sequence, without constructing a 3-D structure. It is built on the SFM principle that transformer softmax attention is mathematically isomorphic to the Boltzmann distribution of molecular binding.

Three applications run on this single model:

  • Off-target safety screening (drug → proteome): documented off-targets at median rank 30 / 4,910 genes (top 0.6%) vs the Klaeger 2017 chemoproteomic panel.
  • Library repurposing (target → drug): 46 novel candidates clear the AlphaFold-3 binder gate across NLRP3 / CD73 / STING1.
  • Generative design (target → novel drug): 850 / 1,200 (71%) of generated molecules match the AlphaFold-3 structural confidence of the approved drug.
Component Model
Drug encoder MoLFormer-XL (frozen, SMILES → 768-d)
Protein encoder ESM-2-650M (frozen, → 1,280-d per residue)
Cross-attention encoder trainable, 2 layers · 8 heads · d=512 · 14.4 M params · 4 heads
Decoder cross-attentive autoregressive SMILES generator (~28 M params)
Training data PDBbind v2020 + SAIR (714,747 pairs · 522,776 drugs · 22,964 proteins)
Split whole MMseqs2 protein clusters held out at 80% identity; zero pair/protein/cluster leakage

Files in this repo

File Description
encoder_b3_epoch010.pt locked production encoder (B-3, 4 heads)
decoder_v02_step50000.pt cross-attentive generative decoder (checkpoint @ 50k steps)

Quick start

from huggingface_hub import hf_hub_download
import torch, torch.nn.functional as F
from calm.encoder.model_v3 import CALMEncoderV3
from calm.decoder.model_dtsfm_v3 import CALMDecoderV3

# --- retrieval / scoring (drug ↔ target) ---
enc = CALMEncoderV3.from_pretrained(
    hf_hub_download("SFM-BIIE-ETHZ/dtSFM-v3", "encoder_b3_epoch010.pt")).eval()
drug   = enc.encode_drug("CC(=O)Oc1ccccc1C(=O)O")               # aspirin
target = enc.encode_protein("MTEYKLVVVGAGGVGKSALTIQLIQ...")
score  = F.cosine_similarity(drug, target, dim=-1)

# --- generation (target → novel molecules) ---
dec = CALMDecoderV3.from_pretrained(
    hf_hub_download("SFM-BIIE-ETHZ/dtSFM-v3", "decoder_v02_step50000.pt")).eval()
smiles = dec.generate(target_sequence="MTEYKLVVVGAGG...", n=100, temperature=0.8)

Install the codebase from github.com/Reddy-BIIE-ETHZ/dtSFM (conda env create -f environment.yml).


Orthogonal verification

Every structural claim is checked by AlphaFold-3 as an orthogonal referee — it shares no architecture, training data, or representation with dtSFM (dtSFM-cosine ↔ AF3-confidence correlation ≈ 0), so structural agreement is genuine corroboration, not circular confirmation. Binder gate: iPTM ≥ 0.7 AND interface-PAE ≤ 5 Å. Anchor-grade gate: iPTM ≥ 0.9 AND PAE ≤ 1.67 Å.


Citation

@article{reddy2026dtsfm,
  title   = {A drug–target specificity foundation model for off-target prediction, repurposing, and generative design},
  author  = {Reddy, Sai T.},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.64898/2026.06.08.730844}
}

License

Released under the SFM Research Preview License v1.0-preview (see LICENSE.md). Free for research use — academic, non-profit, government, and industry research. The specific molecules disclosed in the accompanying preprints are dedicated to the public. Commercial-use and patent-licensing terms are deferred and being arranged with ETH Zürich / BIIE; the SFM architectures and training methods are the subject of pending patent applications. For commercial enquiries: sai.reddy@ethz.ch

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support