LinkLlama cap-50 (merged weights)
Model summary
LinkLlama is a supervised fine-tuned (SFT) decoder-only language model for molecular linker design: given two terminal fragments and simple geometric descriptors (distance, angle), it generates linker SMILES and a short reasonability-style rationale in structured text.
This checkpoint is the cap-50 variant: training examples were built from ChEMBL with a cap-50 rule on linker frequency so overly frequent linkers do not dominate the corpus. The merged model is suitable for inference with Hugging Face transformers (e.g. AutoModelForCausalLM.from_pretrained).
- Base architecture: Meta Llama 3.2 1B Instruct (
meta-llama/Llama-3.2-1B-Instruct) - Fine-tuning: LoRA SFT (Axolotl), merged into full weights for inference
- Training data: instruction-style JSONL; see the companion dataset card (
data.mdin the dataset repository, orchembl36_balanced_cap50.jsonlon the Hub / your local export)
Intended use
Primary use: conditional linker generation for fragment-based design workflows, benchmarking against 2D/3D baselines, and follow-on research. Not intended for general open-ended chat unrelated to chemistry.
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "THGLab/Llama-3.2-1B-Instruct-LinkLlama-Cap50" # replace YOUR_ORG after Hub upload
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
Use the LinkLlama GitHub repository for prompt format, YAML-driven inference, and evaluation scripts.
Limitations
- Outputs are not guaranteed drug candidates; always run medicinal-chemistry and safety filters appropriate to your program.
- Geometric fidelity is prompt-level (distance/angle text), not a full physics-based scoring pipeline.
- Domain shift relative to training (ChEMBL-like small molecules) may affect PROTAC-scale or highly unusual chemistries.
Citation
If you use this model, cite the LinkLlama preprint:
bioRxiv: https://www.biorxiv.org/content/10.64898/2026.04.15.718690v1
@article{sun_linkllama_2026,
title = {{LinkLlama}: {Enabling} {Large} {Language} {Model} for {Chemically} {Reasonable} {Linker} {Design}},
author = {Sun, Kunyang and Wang, Yingze Eric and Purnomo, Justin Clement and Cavanagh, Joseph M. and Alteri, Giovanni Battista and Head-Gordon, Teresa},
year = {2026},
doi = {10.64898/2026.04.15.718690},
url = {https://www.biorxiv.org/content/10.64898/2026.04.15.718690v1},
journal = {bioRxiv},
}
License and third-party terms
- Source code for the LinkLlama project: Regents of the University of California license (see the GitHub
LICENSEfile). - This checkpoint is a derivative of Meta Llama 3.2. Users must comply with Meta’s Llama license and Hugging Face access rules for the base model. Do not redistribute in violation of those terms.
- Downloads last month
- 18
Model tree for THGLab/Llama-3.2-1B-Instruct-LinkLlama-Cap50
Base model
meta-llama/Llama-3.2-1B-Instruct