Instructions to use fairydance/molexar-10m-omni with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fairydance/molexar-10m-omni with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="fairydance/molexar-10m-omni")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("fairydance/molexar-10m-omni", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use fairydance/molexar-10m-omni with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "fairydance/molexar-10m-omni" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fairydance/molexar-10m-omni", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/fairydance/molexar-10m-omni
- SGLang
How to use fairydance/molexar-10m-omni with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "fairydance/molexar-10m-omni" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fairydance/molexar-10m-omni", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "fairydance/molexar-10m-omni" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fairydance/molexar-10m-omni", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use fairydance/molexar-10m-omni with Docker Model Runner:
docker model run hf.co/fairydance/molexar-10m-omni
Molexar-10M Omni
Molexar-10M Omni is the universal multi-condition model for Molexar, a unified multimodal molecular foundation model for drug design. It starts from fairydance/molexar-10m-base and is supervised fine-tuned to generate Fragment-SELFIES molecules under scalar molecular-property, pharmacophore-fingerprint, protein-sequence, and protein-pocket conditions.
This model corresponds to the Universal Multi-Condition Model described in the Molexar paper.
Project resources:
- Molexar code: https://github.com/fairydance/Molexar
- Fragment-SELFIES code: https://github.com/fairydance/Fragment-SELFIES
- Official website: https://molexar.com
Model Details
| Field | Value |
|---|---|
| Model family | Molexar molecular causal language model |
| Architecture | Gemma2-style decoder with value-token embedding replacement for conditions |
| Base model | fairydance/molexar-10m-base |
| LM component parameters | 10,534,912 |
| Total model parameters | 14,756,261 |
| Layers | 16 |
| Hidden size | 256 |
| Intermediate size | 640 |
| Attention heads | 4 query heads, 1 key-value head |
| Vocabulary size | 127 |
| Context length | 256 tokens |
| Sliding window | 128 tokens |
| Molecular language | Fragment-SELFIES |
| Model files | config.json, pytorch_model.bin, tokenizer.json, tokenizer_config.json, training_args.bin |
Parameter counts are unique nn.Parameter counts with tied token-embedding/LM-head weights counted once. The LM component includes the token embeddings, Gemma2-style decoder, final normalization, and tied output head; the total additionally includes condition encoders and the pocket GVP encoder.
Molexar uses a shared sequence template for pretraining, SFT, and inference:
<BOS><COND> conditions </COND><SEP><MOL> molecule </MOL><EOS>
The condition block contains ordered key-token/value-token pairs. During conditional generation, selected <VALUE> token embeddings are replaced in place by encoded condition vectors. This keeps all generation modes on the same autoregressive decoding path and remains compatible with key-value-cache generation.
Supported Conditions
| Key | Meaning | Encoding / Range |
|---|---|---|
mol_hac |
Heavy atom count | one-hot, 2 to 50 |
mol_hbdc |
Hydrogen-bond donor count | one-hot, 0 to 10 |
mol_hbac |
Hydrogen-bond acceptor count | one-hot, 0 to 22 |
mol_rotbc |
Rotatable bond count | one-hot, 0 to 20 |
mol_wt |
Molecular weight, Da | RBF, 30 to 750, 128 steps |
mol_logp |
LogP | RBF, -6 to 12, 96 steps |
mol_tpsa |
Topological polar surface area | RBF, 0 to 200, 96 steps |
mol_qed |
QED | RBF, 0.3 to 1.0, 64 steps |
mol_sas |
Synthetic accessibility score | RBF, 1.0 to 5.0, 64 steps |
mol_pharma_fp |
2D pharmacophore fingerprint | direct vector, 1032 dimensions |
prot_seq_esm_emb |
Protein sequence embedding | direct vector, 1152 dimensions |
prot_poc_gvp_emb |
Protein pocket geometry embedding | GVP/pocket vector, 256 dimensions |
Protein sequence conditioning uses mean-pooled ESMC-600M final embeddings in the paper. Pocket conditioning processes no-hydrogen pocket PDB structures with a 25 Angstrom radius, a maximum of 425 atoms, and a directed 8-nearest-neighbor atom graph.
Installation
Install Molexar and Fragment-SELFIES before loading the model:
git clone https://github.com/fairydance/Molexar.git
git clone https://github.com/fairydance/Fragment-SELFIES.git
cd Molexar
conda create -n molexar python=3.13
conda activate molexar
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
pip install transformers accelerate datasets evaluate biopython loguru
conda install -c conda-forge rdkit scipy seaborn
python -m pip install -e ../Fragment-SELFIES
python -m pip install -e . --no-deps
python -c "import fragment_selfies; import molexar; print('Molexar environment ready')"
Install the runtime dependencies listed in the Molexar repository documentation. Fragment-SELFIES is required to convert generated Fragment-SELFIES strings to SMILES. Protein-sequence conditioning also requires the auxiliary ESM embedding environment described by the Molexar repository.
Download
hf download fairydance/molexar-10m-omni --local-dir molexar-10m-omni
Usage
Property-conditioned generation:
python scripts/run_inference.py --mode conditional \
--model_path /path/to/molexar-10m-omni \
--mol_wt 450 \
--mol_logp 3.5 \
--mol_hbdc 2 \
--num_samples 10 \
--convert_to_smiles \
--canonical \
--output_file property_samples.jsonl \
--output_format jsonl
Pharmacophore-fingerprint conditioning from a reference SMILES:
python scripts/run_inference.py --mode conditional \
--model_path /path/to/molexar-10m-omni \
--condition_key mol_pharma_fp \
--reference_smiles 'Cc1nnc(N2CCNCC2)s1' \
--num_samples 10 \
--convert_to_smiles \
--canonical
Protein-sequence conditioning:
python scripts/run_inference.py --mode conditional \
--model_path /path/to/molexar-10m-omni \
--protein_sequence 'MKTIIALSYIFCLVFAKDRTEG' \
--num_samples 10 \
--convert_to_smiles \
--canonical
Protein-pocket conditioning:
python scripts/run_inference.py --mode conditional \
--model_path /path/to/molexar-10m-omni \
--pocket_pdb /path/to/pocket.pdb \
--pocket_radius 25 \
--max_atoms 425 \
--num_samples 10 \
--convert_to_smiles \
--canonical
Omni also supports fragment-constrained generation with active conditions by combining condition flags with --generation_task and --start_smiles or --start_string. Supported generation tasks are de_novo, motif_extension, scaffold_decoration, linker_design, scaffold_morphing, and superstructure.
Training
Molexar-10M Omni was initialized from Molexar-10M Base and trained with universal multi-condition SFT. The SFT objective masks the prefix through <MOL> and applies loss to the molecular continuation and closing tokens conditioned on the prompt and injected values.
Training script provenance:
examples/train/bjx_h800_sft_universal_multi_unleaky.sh
The SFT data combines molecule-context and target-context samples. Molecule-context samples use the UniChem-derived Fragment-SELFIES corpus with nine scalar properties and a 2D pharmacophore fingerprint. Target-context samples use protein-ligand pairs from SAIR and the PLINDER training set, with protein-sequence ESM embeddings and processed pocket structures. The Molexar paper reports removing target-context training pairs whose protein sequence had more than 30% identity to any CrossDocked2020 test protein; after filtering, the target-context pool contains 573,463 SAIR pair records and 21,770 PLINDER training-set pair records.
Main training settings from the release script and paper:
| Setting | Value |
|---|---|
| Objective | Universal multi-condition supervised fine-tuning |
| Sequence length | 256 |
| Epochs | 5 |
| Batch size | 1000 |
| Learning rate | 2e-4 |
| Warmup steps | 2000 |
| Molecule:target sample ratio | 4:1 |
| Molecule-side active conditions | 1, 2, or 3 conditions with probabilities 0.6, 0.3, 0.1 |
| Pharmacophore oversampling probability | 0.5 |
| Mixed precision | bfloat16 |
| Distributed training | Full-shard FSDP on 8 H800 GPUs |
Evaluation Highlights
The Molexar paper reports that the SFT model follows single-, dual-, and triple-property instructions and supports pharmacophore, protein-sequence, and pocket-geometry conditioning.
CrossDocked2020 target-conditioned generation highlights:
| Conditioning mode | Validity | Uniqueness | Diversity | QED | SA | Lipinski | Vina | High-affinity ratio |
|---|---|---|---|---|---|---|---|---|
| Sequence | 1.00 | 0.98 | 0.83 | 0.65 | 0.82 | 4.74 | -7.25 | 43.1 |
| 1.00 | 0.97 | 0.84 | 0.65 | 0.83 | 4.82 | -7.42 | 53.0 | |
| Pharmacophore | 1.00 | 0.91 | 0.76 | 0.59 | 0.71 | 4.69 | -6.79 | 38.4 |
On MolGenBench, the paper reports high chemical-filter pass rates, strong active-molecule and scaffold recovery in de novo generation across protein targets, and favorable hit-to-lead potency when conditioning jointly on pocket and reference-ligand pharmacophore.
Intended Use
This model is intended for research use in molecular generation workflows, including:
- Property-controlled molecule generation.
- Pharmacophore-guided molecule generation.
- Protein-sequence-conditioned target-aware generation.
- Protein-pocket-conditioned target-aware generation.
- Multi-condition molecular library ideation.
- Fragment-constrained generation with optional active conditions.
Generated molecules should be treated as computational hypotheses. They require independent chemical-safety filtering, synthetic feasibility assessment, intellectual-property and dual-use review where relevant, expert medicinal-chemistry assessment, and experimental validation before downstream use.
Limitations
- The model was trained on filtered drug-like chemistry; rare, contradictory, or out-of-distribution condition combinations may be followed less reliably.
- Docking, pharmacophore, property, or sequence/pocket scores are not evidence of biological activity, safety, or clinical utility.
- Protein-sequence and pocket conditioning depend on preprocessing quality, including ESM embeddings and pocket structure preparation.
- Fragment-SELFIES decoding improves validity but does not guarantee synthetic accessibility, biological activity, safety, or developability.
- The released tokenizer does not include the iodine token
[I]; use bromine substitution in start constraints when necessary, as documented by the Molexar inference script. - Stereochemical and explicit 3D output control are outside the scope of this model.
License
This model is released under the MIT License.
Citation
If you use this model, please cite Molexar and Fragment-SELFIES:
@misc{lin2026molexar,
title = {Molexar: A Unified Multimodal Molecular Foundation Model for Drug Design},
author = {Lin, Haoyu and Liao, Yiyan and Pan, Jinmei and Ling, Xinliao and Lai, Luhua and Pei, Jianfeng},
year = {2026},
url = {https://molexar.com}
}
Code and resources:
- Downloads last month
- 9
Model tree for fairydance/molexar-10m-omni
Base model
fairydance/molexar-10m-base