Text Generation
PEFT
Safetensors
lora
grpo
molecular-optimization
chemistry
drug-discovery
conversational
Instructions to use Sumeetkulk/mol-grpo-optimizer-abls with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Sumeetkulk/mol-grpo-optimizer-abls with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B") model = PeftModel.from_pretrained(base_model, "Sumeetkulk/mol-grpo-optimizer-abls") - Notebooks
- Google Colab
- Kaggle
mol-grpo-optimizer — ABLS-Mol GRPO
LoRA adapter fine-tuned on DeepSeek-R1-Distill-Llama-8B via SFT → ABLS-Mol GRPO for goal-directed molecular optimization (GuacaMol benchmark).
Task: Given a seed SMILES, generate a variant maximizing QED (drug-likeness) while keeping Tanimoto similarity in [0.3, 0.7] to the seed molecule.
Benchmark Results (960 completions, 60 eval seeds × 16)
| Metric | Base | SFT only | Plain GRPO | ABLS-Mol GRPO |
|---|---|---|---|---|
| Validity | 46.7% | 71.8% | 70.1% | 75.8% |
| Mean ΔQED | −0.084 | +0.018 | +0.022 | +0.006 |
| In-Band Rate | 3.3% | 52.2% | 50.8% | 48.8% |
| Composite Success | 0.4% | 21.7% | 20.0% | 21.1% |
| Mean Tanimoto | 0.401 | 0.597 | 0.554 | 0.659 |
| Novel Scaffold Rate | 51.6% | 71.6% | 77.7% | 64.0% |
ABLS-Mol achieves the best validity (75.8%) and tightest Tanimoto centering (0.659), indicating strong seed-anchored edits with minimal generic template generation.
Training Details
- Base model: DeepSeek-R1-Distill-Llama-8B
- Quantization: 4-bit NF4 double quantization (QLoRA)
- LoRA: rank=16, α=32, dropout=0.05, all attention + FFN projections
- Stage 1 — SFT: Synthetic CoT traces on 240 GuacaMol seeds, 2 epochs
- Stage 2 — ABLS-Mol GRPO: G=8, asymmetric bounded log-variance acceptance, asymmetric similarity penalty (α_low=18, α_high=10), RTRL consistency reward, sim-gated RePO, within-group diversity bonus
- Dataset: GuacaMol test split (ChEMBL 24), 300 seeds, stratified 80/20 by QED quartile
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
base = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(
base, "Sumeetkulk/mol-grpo-optimizer-abls"
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
"Sumeetkulk/mol-grpo-optimizer-abls"
)
system = (
"You are a medicinal chemist. Reason step by step about which structural "
"changes will increase QED while keeping Tanimoto similarity between 0.3 "
"and 0.7. Then output only the optimized SMILES."
)
seed_smiles = "CC1=CC=CC=C1Nc1ncnc2ccccc12"
messages = [
{"role": "system", "content": system},
{"role": "user", "content": f"Seed: {seed_smiles}\nOptimize:"},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.85,
do_sample=True,
top_p=0.95,
repetition_penalty=1.1,
)
completion = tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(completion)
# Extract: import re; re.search(r'<answer>(.*?)</answer>', completion).group(1)
Citation
@misc{mol-grpo-optimizer-abls,
title = {Goal-Directed Molecular Optimization via ABLS-Mol GRPO},
year = {2026},
note = {DeepSeek-R1-Distill-Llama-8B fine-tuned with SFT + ABLS-Mol GRPO
on GuacaMol for QED-maximizing molecular editing},
}
- Downloads last month
- 47
Model tree for Sumeetkulk/mol-grpo-optimizer-abls
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B