LinkLlama cap-50 (merged weights)

Model summary

LinkLlama is a supervised fine-tuned (SFT) decoder-only language model for molecular linker design: given two terminal fragments and simple geometric descriptors (distance, angle), it generates linker SMILES and a short reasonability-style rationale in structured text.

This checkpoint is the cap-50 variant: training examples were built from ChEMBL with a cap-50 rule on linker frequency so overly frequent linkers do not dominate the corpus. The merged model is suitable for inference with Hugging Face transformers (e.g. AutoModelForCausalLM.from_pretrained).

Base architecture: Meta Llama 3.2 1B Instruct (meta-llama/Llama-3.2-1B-Instruct)
Fine-tuning: LoRA SFT (Axolotl), merged into full weights for inference
Training data: instruction-style JSONL; see the companion dataset card (data.md in the dataset repository, or chembl36_balanced_cap50.jsonl on the Hub / your local export)

Intended use

Primary use: conditional linker generation for fragment-based design workflows, benchmarking against 2D/3D baselines, and follow-on research. Not intended for general open-ended chat unrelated to chemistry.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "THGLab/Llama-3.2-1B-Instruct-LinkLlama-Cap50"  # replace YOUR_ORG after Hub upload
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

Use the LinkLlama GitHub repository for prompt format, YAML-driven inference, and evaluation scripts.

Limitations

Outputs are not guaranteed drug candidates; always run medicinal-chemistry and safety filters appropriate to your program.
Geometric fidelity is prompt-level (distance/angle text), not a full physics-based scoring pipeline.
Domain shift relative to training (ChEMBL-like small molecules) may affect PROTAC-scale or highly unusual chemistries.

Citation

If you use this model, cite the LinkLlama preprint:

bioRxiv: https://www.biorxiv.org/content/10.64898/2026.04.15.718690v1

@article{sun_linkllama_2026,
  title   = {{LinkLlama}: {Enabling} {Large} {Language} {Model} for {Chemically} {Reasonable} {Linker} {Design}},
  author  = {Sun, Kunyang and Wang, Yingze Eric and Purnomo, Justin Clement and Cavanagh, Joseph M. and Alteri, Giovanni Battista and Head-Gordon, Teresa},
  year    = {2026},
  doi     = {10.64898/2026.04.15.718690},
  url     = {https://www.biorxiv.org/content/10.64898/2026.04.15.718690v1},
  journal = {bioRxiv},
}

License and third-party terms

Source code for the LinkLlama project: Regents of the University of California license (see the GitHub LICENSE file).
This checkpoint is a derivative of Meta Llama 3.2. Users must comply with Meta’s Llama license and Hugging Face access rules for the base model. Do not redistribute in violation of those terms.