AIS-RoChem3D Paper Checkpoints

This repository provides the AIS-RoChem3D model checkpoints used in the paper experiments, together with the model definition, vocabulary files, and minimal examples for embedding extraction and downstream prediction.

The full pretraining datasets and generated H5 caches are not included because of their size.

Checkpoints

file meaning
checkpoints/step76104.pt paper checkpoint at global step 76104
checkpoints/final.pt final paper checkpoint after resumed training

Python Interface

The model can be imported from the aisrochem3d package:

  • src/aisrochem3d/
  • AISRoChem3DConfig
  • AISRoChem3DPretrainModel
  • paper_config()
  • load_checkpoint()
  • load_model_from_checkpoint()
  • featurize_smiles()
  • embed_batch()

Main Model Configuration

  • pair_dim = 512
  • max_token_length = 384
  • ProbMix distance branch with MAT-style distance kernel
  • tau = 2.0
  • static mix raw lambda (1.0, 0.5, 0.5), normalized to content/edge/dist = 0.5/0.25/0.25
  • edge embedding dimension 128
  • edge probability branch scale initialized at 1.0

Minimal Checks

pip install -r requirements.txt
python examples/smoke_forward.py
python examples/load_checkpoint_keys.py --checkpoint checkpoints/step76104.pt
python examples/run_embedding_demo.py --smiles "CC(=O)Oc1ccccc1C(=O)O" --checkpoint checkpoints/step76104.pt
python examples/run_downstream_demo.py --smiles "CC(=O)Oc1ccccc1C(=O)O" --checkpoint checkpoints/step76104.pt

Downstream Evaluation Note

The downstream benchmark results reported in the paper were obtained by fine-tuning task-specific heads from the released checkpoints. This repository focuses on the pretrained AIS-RoChem3D checkpoints, model definition, vocabularies, and minimal embedding/property-head examples. It does not include the full downstream hyperparameter search records or task-specific fine-tuned heads.

Vocabulary Files

  • vocab/ais_vocab_qcmerged.txt
  • vocab/bond_triplet_vocab_qcmerged.tsv

model_config.json records the vocabulary sizes and edge unknown-bucket ids.

SMILES-to-Embedding Example

The examples include a minimal SMILES-to-embedding path:

  1. canonicalize to no-H SMILES;
  2. AIS tokenize with token-to-atom alignment;
  3. generate a heavy-atom RDKit conformer;
  4. map bond triplets into the model edge-id space;
  5. run the encoder and pool atom embeddings.

examples/run_downstream_demo.py shows the property-head interface from embeddings. Without --head, it uses a deterministic demo head only; replace it with a fine-tuned head for real supervised prediction.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support