mol-grpo-optimizer — ABLS-Mol GRPO

LoRA adapter fine-tuned on DeepSeek-R1-Distill-Llama-8B via SFT → ABLS-Mol GRPO for goal-directed molecular optimization (GuacaMol benchmark).

Task: Given a seed SMILES, generate a variant maximizing QED (drug-likeness) while keeping Tanimoto similarity in [0.3, 0.7] to the seed molecule.

Benchmark Results (960 completions, 60 eval seeds × 16)

Metric	Base	SFT only	Plain GRPO	ABLS-Mol GRPO
Validity	46.7%	71.8%	70.1%	75.8%
Mean ΔQED	−0.084	+0.018	+0.022	+0.006
In-Band Rate	3.3%	52.2%	50.8%	48.8%
Composite Success	0.4%	21.7%	20.0%	21.1%
Mean Tanimoto	0.401	0.597	0.554	0.659
Novel Scaffold Rate	51.6%	71.6%	77.7%	64.0%

ABLS-Mol achieves the best validity (75.8%) and tightest Tanimoto centering (0.659), indicating strong seed-anchored edits with minimal generic template generation.

Training Details

Base model: DeepSeek-R1-Distill-Llama-8B
Quantization: 4-bit NF4 double quantization (QLoRA)
LoRA: rank=16, α=32, dropout=0.05, all attention + FFN projections
Stage 1 — SFT: Synthetic CoT traces on 240 GuacaMol seeds, 2 epochs
Stage 2 — ABLS-Mol GRPO: G=8, asymmetric bounded log-variance acceptance, asymmetric similarity penalty (α_low=18, α_high=10), RTRL consistency reward, sim-gated RePO, within-group diversity bonus
Dataset: GuacaMol test split (ChEMBL 24), 300 seeds, stratified 80/20 by QED quartile

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    base, "Sumeetkulk/mol-grpo-optimizer-abls"
)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(
    "Sumeetkulk/mol-grpo-optimizer-abls"
)

system = (
    "You are a medicinal chemist. Reason step by step about which structural "
    "changes will increase QED while keeping Tanimoto similarity between 0.3 "
    "and 0.7. Then output only the optimized SMILES."
)
seed_smiles = "CC1=CC=CC=C1Nc1ncnc2ccccc12"

messages = [
    {"role": "system", "content": system},
    {"role": "user",   "content": f"Seed: {seed_smiles}\nOptimize:"},
]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.85,
        do_sample=True,
        top_p=0.95,
        repetition_penalty=1.1,
    )
completion = tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(completion)
# Extract: import re; re.search(r'<answer>(.*?)</answer>', completion).group(1)

Citation

@misc{mol-grpo-optimizer-abls,
  title  = {Goal-Directed Molecular Optimization via ABLS-Mol GRPO},
  year   = {2026},
  note   = {DeepSeek-R1-Distill-Llama-8B fine-tuned with SFT + ABLS-Mol GRPO
            on GuacaMol for QED-maximizing molecular editing},
}

Downloads last month: 47

Model tree for Sumeetkulk/mol-grpo-optimizer-abls

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Adapter

(230)

this model