mol-grpo-optimizer — ABLS-Mol GRPO

LoRA adapter fine-tuned on DeepSeek-R1-Distill-Llama-8B via SFT → ABLS-Mol GRPO for goal-directed molecular optimization (GuacaMol benchmark).

Task: Given a seed SMILES, generate a variant maximizing QED (drug-likeness) while keeping Tanimoto similarity in [0.3, 0.7] to the seed molecule.

Benchmark Results (960 completions, 60 eval seeds × 16)

Metric Base SFT only Plain GRPO ABLS-Mol GRPO
Validity 46.7% 71.8% 70.1% 75.8%
Mean ΔQED −0.084 +0.018 +0.022 +0.006
In-Band Rate 3.3% 52.2% 50.8% 48.8%
Composite Success 0.4% 21.7% 20.0% 21.1%
Mean Tanimoto 0.401 0.597 0.554 0.659
Novel Scaffold Rate 51.6% 71.6% 77.7% 64.0%

ABLS-Mol achieves the best validity (75.8%) and tightest Tanimoto centering (0.659), indicating strong seed-anchored edits with minimal generic template generation.

Training Details

  • Base model: DeepSeek-R1-Distill-Llama-8B
  • Quantization: 4-bit NF4 double quantization (QLoRA)
  • LoRA: rank=16, α=32, dropout=0.05, all attention + FFN projections
  • Stage 1 — SFT: Synthetic CoT traces on 240 GuacaMol seeds, 2 epochs
  • Stage 2 — ABLS-Mol GRPO: G=8, asymmetric bounded log-variance acceptance, asymmetric similarity penalty (α_low=18, α_high=10), RTRL consistency reward, sim-gated RePO, within-group diversity bonus
  • Dataset: GuacaMol test split (ChEMBL 24), 300 seeds, stratified 80/20 by QED quartile

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    base, "Sumeetkulk/mol-grpo-optimizer-abls"
)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(
    "Sumeetkulk/mol-grpo-optimizer-abls"
)

system = (
    "You are a medicinal chemist. Reason step by step about which structural "
    "changes will increase QED while keeping Tanimoto similarity between 0.3 "
    "and 0.7. Then output only the optimized SMILES."
)
seed_smiles = "CC1=CC=CC=C1Nc1ncnc2ccccc12"

messages = [
    {"role": "system", "content": system},
    {"role": "user",   "content": f"Seed: {seed_smiles}\nOptimize:"},
]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.85,
        do_sample=True,
        top_p=0.95,
        repetition_penalty=1.1,
    )
completion = tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(completion)
# Extract: import re; re.search(r'<answer>(.*?)</answer>', completion).group(1)

Citation

@misc{mol-grpo-optimizer-abls,
  title  = {Goal-Directed Molecular Optimization via ABLS-Mol GRPO},
  year   = {2026},
  note   = {DeepSeek-R1-Distill-Llama-8B fine-tuned with SFT + ABLS-Mol GRPO
            on GuacaMol for QED-maximizing molecular editing},
}
Downloads last month
47
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sumeetkulk/mol-grpo-optimizer-abls

Adapter
(230)
this model